Re-construction  and  Weighting 


1 


Re-construction  of  Reference  Population  and  Generating  Weights  by  Decision  Tree1 


Wei  Wan,  Ph.D. 

Junior  Faculty  Research  Fellow,  DEOMI 
Summer  2017 

Claflin  University 
Orangeburg,  SC  29115 


DEFENSE  EQUAL  OPPORTUNITY  MANAGEMENT  INSTITUTE 

RESEARCH,  DEVELOPMENT,  AND  STRATEGIC  INITIATIVES  DIRECTORATE,  Patrick  AFB, 

FL 


Submitted  to  Dr.  Daniel  P.  McDonald,  Executive  Director  of  Research 

July21,  2017 


Technical  Report  #11-17 

1  The  opinions  expressed  in  this  report  ore  those  of  the  author  and  should  not  be  construed  to  represent 
the  official  position  of  the  U.S.  military  services,  or  the  Department  of  Defense,  or  DEOMI. 


Re-construction  and  Weighting 


2 


Abstract 

The  DEOCS  received  responder  data,  which  does  not  contain  non-responses,  directly  through  the 
survey  as  well  as  unit  population  data  through  DMDC.  To  estimate  statistical  characteristic  of  the 
population,  the  DEOCS  team  has  merged  the  unit  population  data  into  survey  data,  which  is  a  dataset  of 
-260,000  cases.  However,  the  non-responses  rate  is  more  than  60%,  so  the  responder  data  may  not  be 
representative  of  population.  In  order  to  compensate  for  non-responses,  weighting  is  needed  to  avoid 
bias.  In  order  for  computing  post-stratification  weights,  the  first  step  is  to  design  and  realize  an 
algorithm  by  Python  to  re-construct  the  population.  The  second  step  is  to  compute  weights.  The  last  step 
is  to  weight  response  cases  and  analyze.  Two  methods  were  adopted  in  the  process  of  computing 

weights.  The  first  weighting  method  is  to  compute  post-stratification  weights  from  crosstabs.  This 

/ 

method  is  used  to  compute  two  types  of  weights.  The  type  1  is  weighting  with  respect  to  unit  reference 
population.  The  type  2  is  weighting  with  respect  to  the  whole  reference  population.  The  second  method 
is  to  use  Logistic  Regression  approach  to  compute  weights.  SPSS  decision  tree  with  CHAID  module  has 
been  used  to  compute  the  probabilities  of  predict  factors  for  Logistic  Regression.  In  the  end,  we  compare 
the  effects  of  the  weights  from  these  two  different  methods  on  distribution  of  variables. 
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1.  Introduction 

Purpose/Statement  of  Problem:  The  DEOCS  team  has  merged  the  unit  population  data  into  survey 
data.  It  is  a  dataset  of  241,027cases,  310  variables  as  follows.  Each  row  is  for  one  responder.  The  data  in 
red  dotted  box  is  group  x  gender  crosstab,  which  is  the  reference  population  of  unit. 


Table  1:  Original  Dataset 


4FEOC4  I 

™  E  E1E3M 

ID 


In  this  dataset,  there  are  several  demographical  variables,  but  we  are  interested  in  only  two  variables: 

“group”,  which  is  rank  of  responders,  and  “gender”.  The  variable  information  is  as  follows: 

/ 

Table  2:  Variable-Group 


Value 

1 

2 

3 

4 

5 

6 

7 

8 

10 

Label 

E1-E3 

E4-E6 

E7-E9 

Wl- 

OI¬ 

04- 

Grade 

Grade  9 -15  & 

Other 

W5 

OS 

06 

1-8 

SES 

Table  3:  Variable-Gender 


Value 

1 

2 

Label 

Male 

Female 

However,  in  above  dataset  there  exists  a  lot  of  non-responses.  The  responses  rate  is  low  (  <  40%).  How 
can  we  make  our  analysis  meaningful  in  terms  of  whole  population  based  on  those  responses?  Or  how 
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can  we  generalize  our  analysis  results  into  whole  population?  Having  a  representative  sample  of  the 
population  is  of  paramount  importance.  It  is  not  unusual  that  a  certain  demographic  characteristics  of  the 
sample  is  distributed  different  from  the  population.  This  difference  introduces  bias  into  any  estimate  we 
obtain  from  sample  data  because  statistical  procedures  will  give  greater  weight  to  these  oversampled 
people. 

Design/methodology/approach:  The  solution  to  solve  above  “bias”  is  solved  by  post-stratification 
weight.  However,  in  order  to  calculate  a  post-stratification  weight,  we  need  an  auxiliary  dataset,  that  is 
reference  population,  to  which  we  can  compare  the  sample  data.  The  dataset^of  this  project  has  the 
reference  population  on  unit  level  for  group  and  gender,  which  is  in  red-dotted  box  in  Table  1. 

The  first  and  most  import  step  of  project  is  to  reconstruct  the  whole  population  from  unit  reference 

/ 

population  by  Python  in  SPSS. 

The  second  step  is  to  calculate  weights.  Two  methods  were  adopted  in  the  process  of  computing 
weights.  The  first  method  is  to  compute  post-stratification  weights  from  crosstabs.  This  method  is  used 
to  compute  two  types  of  weights.  The  type  1  is  computing  weights  with  respect  to  unit  reference 
population.  The  type  2  is  computing  Weights  with  respect  to  the  whole  reference  population.  The  second 
method  is  to  use  Logistic  Regression  approach  to  compute  weight.  SPSS  decision  tree  has  been  used  to 
compute  the  probabilities  of  predict  factors  for  Logistic  Regression.  The  dataset  of  re-constructed 
population  was  used  to  build  the  decision  trees.  Two  variables  were  selected  as  the  inputs  for  the 
decision  trees.  Two  techniques  were  employed  to  build  the  decision  trees  -  Chi-square  automatic 
interaction  detection  (CHAID),  the  exhaustive  chi-squared  automatic  interaction  detector  (ECHAID). 
The  output  of  the  decision  trees  was  the  classification  of  responders  and  non-responders  based  on 
independent  variables  -  group,  gender,  and  probabilities  for  responders  and  non-responders.  Next, 
Logistic  Regression  model  which  has  these  probabilities  as  input,  and  a  binary  variable  (responders- 1; 
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non-responders-2)  as  dependent  variable  was  adopted  to  curve  these  two  probabilities.  Then,  the  weights 
were  calculated  using  these  probabilities.  At  last  but  optional,  the  weights  are  usually  rescaled  so  as  to 
add  to  the  responding  sample  numbers. 

In  the  end,  we  compare  these  weights  from  these  different  methods,  weight  each  case  and  analyze. 
Findings:  1.  The  python  is  the  necessary  tool  for  the  data  manipulation  in  this  project. 

2.  Weights  do  bring  changes  to  the  distribution  of  survey  dataset,  but  choosing  the  proper  type 
of  weight  will  depend  on  the  original  goal  of  survey. 

3.  Missing  values  and  inconsistence  values  lead  to  errors.  The  missing  value  in  population 
data  led  to  no  rows  would  be  generated  for  non-responses  for  that  unit.  Inconsistence  between 

unit  population  data  and  survey  data  will  lead  to  extra  row  for  non-responses. 

/ 

4.  A  “bigger”  decision  tree  is  recommended  by  using  more  input  variables. 

5.  The  weight  from  second  method  is  easier  to  applied,  but  may  be  less  accurate. 

Originality/value : 

1.  The  complexity  and  size  of  data  in  SPSS  decided  the  complexity  of  data  manipulation.  An 
algorithm  has  been  designed  and  realized  by  Python  to  re-construct  the  whole  population.  The 
Python  codes  for  this  algorithm  can  be  used  as  a  template  for  future  similar  work. 

2.  Once  the  population  data  is  generated  or  available,  “decision  trees”  module  is  an  efficient 
tool  to  classify  responders  and  non-responders,  compute  their  probability,  and  compute  the 
weights  for  each  node.  The  complexity  of  the  decision  trees  theoretically  depends  on  the 
number  of  input  variables,  instead  of  number  of  cases.  If  more  independent  variables  are 
needed,  other  statistical  techniques  may  be  adopted  to  decrease  the  number  of  input  variables 
and  thereby  reduce  the  complexity  of  the  decision  trees. 


2.  Weighting  Methods 
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2.1  What  is  a  Survey  Weight? 

When  analyzing  a  survey,  having  a  representative  sample  of  the  population  is  very  important.  We 
may  accidentally  (or  sometimes  intentionally)  oversample  some  types  of  people  and  under-sample 
others.  In  other  words,  the  distribution  of  a  certain  characteristic  such  as  age,  rank,  race,  gender,  etc.  of 
the  sample  may  be  different  from  their  distribution  in  the  population.  Thus,  weight  is  defined  to  be  a 
positive  value  assigned  to  each  respondent  record  in  a  survey  data  file  in  order  to  make  the  weighted 
records  represent  the  population  of  inference  as  closely  as  possible,  so  that  the  analysis  results  from 
response  data  can  be  applied  to  the  whole  population.  For  example:  A  weight  of  2  means  that  the  case 
counts  in  the  dataset  as  two  identical  cases.  A  weight  of  1  means  that  the  case  counts  in  the  dataset  as 

one  identical  cases.  A  weight  of  1/2  means  that  the  case  counts  in  the  dataset  as  1/2  identical  cases. 

/ 

The  weights  are  usually  developed  in  a  series  of  stages  to  compensate  for  unequal  selection 
probabilities,  non-response,  non-coverage,  and  sampling  fluctuations  from  known  population  values 
[13].  In  terms  of  progress  in  time  of  weighting  process,  it  can  be  divided  into  three  stages  as  follows  [9]. 

2.2  Typical  Stages  of  Weighting 

In  terms  of  weight  process,  we  can  divide  this  process  into  three  stages.  The  first  stage  of  weighting 
for  unequal  selection  probabilities  is  generally  straightforward.  Each  sampled  element  (whether 
respondent  or  non-respondent)  is  assigned  a  base  weight  that  is  either  the  inverse  of  the  element's 
selection  probability  or  proportional  to  that  inverse.  With  probability  sampling,  the  selection 
probabilities  are  known,  and  the  base  weights  are  generally  readily  determined.  A  difficulty  that  occurs 
with  the  base  weights  in  this  project  arises  from  the  lack  of  sampling  frame  and  population  data,  which 
result  in  the  probabilities  of  sampled  elements  being  selected  is  unknown. 

The  second  stage  of  weight  development  is  usually  to  attempt  to  compensate  for  unit  or  total,  non¬ 
response.  The  base  weights  of  responding  elements  are  adjusted  to  compensate  for  the  non-responding 


Re-construction  and  Weighting 


9 


elements.  The  general  strategy  is  to  identify  respondents  who  are  similar  to  the  non-respondents  in  terms 
of  auxiliary  information  that  is  available  for  both  respondents  and  non-respondents,  and  then  to  increase 
the  base  weights  of  respondents  so  that  they  represent  similar  non-respondents.  In  many  cases  little  is 
known  about  the  non-respondents  (often  only  their  stratum  and  cluster),  in  which  case  a  simple  cell 
weighting  adjustment  may  be  used.  In  this  project,  the  auxiliary  information  of  non-respondents 
available  or  computed  is  group  and  gender.  Respondents  and  non-respondents  are  sorted  into  weighting 
cells,  and  the  weights  of  the  respondents  in  each  cell  are  increased  by  a  multiplying  factor  so  that  the 
respondents  represent  the  non-respondents  in  that  cell.  This  method  works  well  when  there  is  limited 
auxiliary  information  available  for  the  non-respondents.  However,  when  a  sizeable  amount  of  auxiliary 
information  is  available,  and  the  researcher  wants  to  incorporate  much  of  it  in  the  non-response 

7 

weighting  adjustments,  then  other  alternative  methods  may  be  needed. 

The  third  stage  of  weight  development  involves  a  further  adjustment  to  the  weights  to  make  the 
resultant  weighted  estimates  from  the  sample  conform  to  known  population  values  for  some  key 
variables.  For  voluntary  surveys  non-response  is  the  greatest  factor  to  affect  the  accuracy  of  the  survey 
estimates.  Different  surveys  achieve  different  response  rates,  the  surveys  with  the  highest  response  rates 
tending  to  be  those  that  ask  questions  that  seem  relevant  and  interesting  to  respondents.  But,  even  with 
‘popular’  surveys,  response  rates  have  been  declining  in  recent  years,  and,  as  a  direct  consequence, 
worries  about  survey  bias  have  been  increasing.  Non-response  is  only  a  problem  if  the  non-respondents 
are  a  non-random  sample  of  the  total  sample.  Unfortunately,  this  seems  almost  always  to  be  the  case 
[21].  Thus,  another  form  of  adjustment  is  needed  to  force  the  sample  joint  distribution  of  certain 
variables  (  “group”  and  “gender”  in  this  project)  to  match  the  known  population  joint  distribution.  This 
type  of  adjustment  is  often  called  post-stratification.  It  is  called  a  post-stratification  weight  because  it 
can  only  be  computed  after  all  data  are  collected.  The  stratification  part  comes  from  the  fact  that  various 
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known  strata  (such  as  age  group  or  gender  distribution)  of  the  population  are  needed  to  adjust  the  sample 
data  to  conform  more  to  the  population's  parameters.  This  stage  of  adjustment  serves  two  purposes:  to 
compensate  for  non-coverage  and  to  improve  the  precision  of  the  survey  estimates.  It  can  also  be  used  to 
compensate  for  non-response.  It  should  be  noted  that  the  theory  for  post-stratification  presented  in 
survey  sampling  texts  assumes  full  response  and  perfect  coverage.  In  this  situation,  the  adjustments  are 
generally  relatively  small  provided  that  the  sample  sizes  in  the  post-strata  are  reasonably  large,  and  on 
average  post-stratification  can  be  expected  to  lead  to  gains  in  precision  for  the  survey  estimates  [9]. 
However,  when  there  is  sizeable  non-coverage  and/or  non-response  involve^  the  adjustments  can  be 
substantial;  in  this  case  the  adjustments  are  used  to  reduce  the  bias  of  the  survey  estimates,  but  standard 
errors  for  estimates  unrelated  to  the  adjustment  variables  may  be  increased. 

7 

2.3  Typical  Types  of  Weighting  and  Computing  Methods 

From  above  three  stages  in  weighting  process,  we  can  see  that  two  most  common  types  of  survey 
weights  are:  Design  Weights,  Post-Stratification/or  non-response  weights.  Design  Weight  belongs  to 
Stage  1  and  2,  and  is  normally  used  to  compensate  for  over-  or  under-sampling  of  specific  cases  or  for 
disproportionate  stratification.  The  post-stratification  weight  belongs  to  Stage  3,  and  is  used  to 
compensate  for  that  fact  that  persons  with  certain  characteristics  are  not  as  likely  to  respond  to  the 
survey.  The  following  discuss  of  computing  methods  will  be  divided  into  three  parts. 

2.3.1  Computing  Design  Weights 

It  is  straightforward  to  calculate  design  weights.  Supposing  we  know  the  sampling  fraction  for  each 
case,  the  weight  is  the  inverse  of  the  sampling  fraction  [5]: 

1 

Design  weight  = - - - - - 

sampling  fraction 
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The  sampling  fraction  could  also  be  the  over-sampling  amount  for  a  given  group  or  area.  For 

example:  If  we  oversampled  African  Americans  at  a  rate  5  times  greater  than  the  rate  for  Whites,  than 

1 

the  design  weight  for  an  African  American  would  be  -  and  for  a  White  respondent  would  be  1 . 

2.3.2Computing  Post-Stratification  Weights 

However,  it  is  normally  more  difficult  to  weight  for  non-response  with  post-stratification.  This  type 
of  weight  is  calculated  using  population  data.  It  requires  the  use  of  auxiliary  information  about  the 
population  and  may  take  a  number  of  different  variables  into  account.  Specifically,  we  need  population 
estimates  of  the  distribution  of  a  set  of  demographic  characteristics  that  have  also  been  measured  in  the 
sample.  This  is  essentially  a  two-step  procedure [21]: 

Step  1:  identify  a  set  of  ‘control  totals’  (a  set  of  demographic  characteristics)  for  the  population  that 
the  survey  ought  to  match; 

Step  2:  calculate  weights  to  adjust  the  sample  totals  to  the  control  totals. 

If  there  is  only  one  characteristic  to  balance  with  the  population,  then  computing  weight  is  one  by 
following  formula: 

population  proportion 

weight  = - - - 

sample  proportion 

Table  4:  Example  for  Calculating  Post-Stratification  Weights 


Gender 

Population 

Proportion 

Sample 

Proportion 

Population 

Sample 

Weight 

Female 

.5 

.7 

.5  1.1 

.714285 

Male 

.5 

.3 

.5  /.3 

1.66667 

Total 

1 

1 
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In  analysis,  only  one  weight  per  case  can  be  used.  If  there  were  more  than  one  characteristic  to 
balance  with  the  population.  It  is  not  unusual  we  weight  for  different  factors,  these  weights  must  be 
combined  together  into  one  weight.  There  are  following  options  to  deal  with  multiple  characteristics: 

option  1:  using  one  big  N-way  crosstab.  In  order  to  use  this  option,  these  crosstab  tables  available 
must  be  available  from  the  population  source.  The  number  of  cases  in  each  cell  in  the  sample  cannot  be 
too  small.  This  project  will  adopt  this  option  because  there  are  only  two  characteristic  to  balance  (group 
and  gender);  we  are  able  to  compute  the  9X2  crosstab;  the  number  of  cases  in  each  cell  is  not  small. 

option  2:  using  several  separate  frequency  tables  for  each  characteristics' for  the  population.  The 
advantage  of  this  option  is  that  single  variable  frequency  tables  are  more  likely  to  be  available  for  the 

population;  Using  of  frequency  tables  may  reduce  unstable  weights  due  to  small  values  in  the  sample  in 

/ 

the  cells  of  N-way  crosstabs.  The  disadvantage  of  this  method  is  to  combine  the  weights  for  each 
characteristic.  According  to  [5],  there  are 

a.  Compute  a  weight  for  each  characteristic  independently  and  then  multiply  all  these  weights 
together.  This  method  is  not  recommended  since  it  will  usually  not  yield  good  weights. 

b.  Compute  weights  separately  but  sequentially.  Supposing  there  are  three  characteristics  A,  S,  E. 
The  method  is  done  by  following  iterative  process: 

1.  Compute  A  weight  (wA)  and  weight  data  by  this  weight.  Generate  the  weighted  frequency 
table  for  S 

2.  Compute  S  weight  (wS)  and  weight  by  wA*wS.  Generate  the  weighted  frequency  table  for  E 

3.  Compute  E  weight  (wE)  and  weight  by  wA*wS*wE.  Generate  the  weighted  frequency  for  A 

4.  Compute  a  second  A  weight(  wA2)  and  weight  by  wA*wS*wE*wA’.  Generate  the  weighted 
frequency  for  S 
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5.  Compute  a  second  S  weight  (wS2)  and  weight  by  wA*wS*wE*wA2*wS2.  Generate  the 

weighted  frequency  for  E 

6.  Compute  a  second  E  weight  (wE2)  and  weight  by  wA*wS*wE*wA2*wS2*wE2 

Continue  process  until  the  weighted  frequencies  and  the  population  frequencies  don’t  change. 

Usually  converge  after  two  or  three  iterations  (or  less). 

There  are  also  several  software  to  conduct  above  iterative  procedure  automatically,  such  as  SAS 
Raking  macro  Stata  ado. 

c.  Using  Logistic  Regression  approach  to  weighting.  This  approach  requires  that  the  dataset  in  use 

has  those  information  for  the  population  figures.  In  this  project,  we  will  adopt  Logistic 

Regression  approach  with  support  of  Decision  Tree.  That  is  why  it  is  necessary  to  reconstruct  the 

/ 

population  data  from  the  non-response  data.  This  method  is  as  follows  [5]. 

1.  Supposing  reference  population  data  set  includes  age,  education,  race  (in  categories),  gender, 
and  metropolitan  status  variables. 

2.  Assume  we  have  the  same  variables  measured  in  the  same  way  in  the  dataset  we  want  to 
weight  to  increase  representativeness. 

3.  Create  a  subset  of  the  Reference  Population  with  just  these  variables  and  add  an  indicator 
called  “Sample”  set  equal  to  0.  Also  create  of  subset  from  your  survey  with  the  same 
variables  formatted  the  same  as  the  CPS  data,  but  set  the  “Sample”  equal  to  1. 

4.  Combine  the  cases  from  the  two  data  sets  together. 

5.  Use  “sample”  as  a  dependent  variable  in  a  logistic  regression  with  each  of  the  other 
characteristics  as  independent  variables.  Set  the  regression  program  to  save  the  predicted 
probability  (pprob)  from  the  regression  for  each  case  and  include  it  in  the  dataset. 

6.  The  weight  would  be  the  inverse  of  this  predicted  probability: we ig ht  =  ^rob- 
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7.  Yields  weights  that  are  highly  correlated  with  those  obtained  in  raking. 

The  main  constraint  on  using  post-stratification  is  that  the  population  distributions  must  be  known. 
This  automatically  limits  that  the  only  control  totals  that  can  be  used  are  the  ones  available  and  known  to 
be  accurate.  For  most  DEOCS  surveys,  control  totals  tend  to  be  rank,  service,  gender  and  enlisted. 

We  divide  non-response  weights  and  design  weights  into  different  types,  but  they  look  similar,  at 
least  in  terms  of  mathematics.  A  possible  difference  is  that  design  weights  are  known  exactly  but  non¬ 
response  weights  are  only  estimated. 

For  design  weights  we  know  how  many  units  were  selected  and  how  many  were  in  the  sampling 
frame.  Non-response  weights  are  estimated  by  comparing  responding  units  to  totals  from  the  population 

or  from  the  sampling  frame.  If  we  repeated  the  sampling  procedure  many  times  we  would  get  different 

/ 

numbers  of  non-responding  units  in  each  post-strata.  This  would  give  different  non-response  weights  in 
each  possible  sample.  This  uncertainty  in  the  exact  value  of  non-response  weights  should  be  reflected  in 
the  standard  errors  of  the  non-response  adjusted  analyses.  Only  replication  methods  such  as  jackknives 
and  bootstrap  methods  include  this  adjustment.  If  the  post-stratification  is  based  on  a  simple  model  the 
contribution  to  the  standard  error  from  the  uncertainty  in  the  weights  will  be  very  small. 

2.3.3  Computing  Weights  using  Survey  Information  from  Sampling  Frame 

Sometimes  non-response  re-weighting  can  be  carried  out  by  comparing  the  characteristics  of  those 

/ 

who  responded  to  a  survey  with  the  whole  group  who  the  survey  attempted  to  reach.  This  will  not  be 
very  helpful  when  (as  in  most  household  surveys)  we  don’t  know  much  about  the  people  who  do  not 
respond.  This  approach  is  most  helpful  when  we  are  selecting  a  sample  form  an  informative  sampling 
frame.  Examples  might  be  surveys  of  a  workforce,  where  we  know  the  grade,  age,  length  of  service  of 
all  employees.  Another  circumstance  when  this  is  used  is  in  the  context  of  longitudinal  surveys  when  a 
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survey  is  re-contacting  people  who  responded  at  a  previous  wave  of  the  survey.  This  is  done  in  three 
steps.  This  is  done  in  three  steps  [21]. 

1.  Carry  out  an  investigation  of  which  factors  predict  that  a  response  has  been  received. 

1 

2.  Apply  a  weight  to  the  responding  classes  that  is  proportional  to  — — — - - - - — . 

probability  of  responding 

3.  Finally,  the  weights,  at  this  stage  generally  above  1.0,  are  usually  rescaled  so  as  to  add  to  the 
responding  sample  numbers. 

At  the  first  step  a  response  variable  is  attached  to  the  sampling  frame  that  is  coded  as  1  for 
responders  and  0  for  non-responders.  Where  the  sampling  frame  only  tells  us  a  few  things  about  the 
units  we  can  divide  the  sample  up  into  groups  (called  response  classes)  and  the  probability  of  response  is 
simply  the  proportion  who  respond  in  each  response  class. 

When  the  sampling  frame  contains  more  detailed  information  about  the  non-responders  the  factors 
that  influence  non-response  are  often  investigated  via  logistic  regression.  The  resulting  model  is  then 
used  to  calculate  the  probability  of  response  at  Step  1  above,  and  the  subsequent  steps  are  carried  out  in 
the  same  way  as  above. 

/ 

When  this  type  of  regression  model  is  used  we  need  to  strike  a  balance  between  having  a  powerful 
model  to  predict  non-response  (and  so  reduce  bias)  and  the  introduction  of  extreme  weights  that  will 
affect  precision.  Models  are  often  simplified  at  the  final  stage  to  avoid  extreme  weights  (either  large  or 
small).  Another  practice  that  some  surveys  employ  is  to  cap  the  weights,  for  example  by  replacing  all 
weights  above  2.5  with  the  value  of  2.5. 

2.3.4  Combine  Different  Types  of  Weights 

As  what  has  been  discussed  above,  it  is  usual  to  compute  different  types  of  weights  for  same  dataset, 
since  we  need  to  weight  for  different  factors.  However,  only  one  weight  will  be  allowed  to  analyze  the 
dataset,  so  these  weights  must  be  combined  together  into  one  weight  before  use.  Suppose  that  a  design 


Re-construction  and  Weighting 


16 


weight  (Dwate)  and  a  post-stratification  (PSwate)  weight  have  been  computed  for  each  case.  Then  a 
total  weight  will  be  multiplication  of  these  two  weights: 

Total  Weight  =  Dwate  x  PSwate 

Furthermore,  a  weight  cannot  be  equal  to  zero  unless  we  want  the  case  excluded  from  the  analysis.  The 
default  value  is  set  to  1 . 

3.  Decision  Tree 

Decision  tree  models  enable  to  develop  classification  systems  that  predict  or  classify  future 
observations  based  on  a  set  of  decision  rules.  If  we  have  data  divided  into  classes  that  interest  us,  we  can 
use  the  data  to  build  rules  that  we  can  use  to  classify  old  or  new  cases  with  maximum  accuracy.  For 
example,  we  might  build  a  tree  that  classifies  credit  risk  or  purchase  intent  based  on  age  and  other 
factors  [2,29]. 

Tree  building  Algorithm  [10].  Four  algorithms  are  available  for  performing  classification  and 
segmentation  analysis.  These  algorithms  all  perform  basically  the  same  thing:  they  examine  all  of  the 
fields  of  your  dataset  to  find  the  one  that  gives  the  best  classification  or  prediction  by  splitting  the  data 
into  subgroups.  The  process  is  applied  recursively,  splitting  subgroups  into  smaller  and  smaller  units 
until  the  tree  is  finished  (as  defined  by  certain  stopping  criteria).  The  target  and  input  fields  used  in  tree 
building  can  be  continuous  (numeric  range)  or  categorical,  depending  on  the  algorithm  used.  If  a 
continuous  target  is  used,  a  regression  tree  is  generated;  if  a  categorical  target  is  used,  a  classification 
tree  is  generated. 


Table  5:  Decision  Tree  Modules 


FeatureNAlgo 

rithm 

C&R  Tree 

QUEST 

CHAID 

C5.0 

Input 

fields(predict 

ors) 

continuous,  categorical, 
flag,  nominal  or  ordinal 

continuous, 
categorical,  flag, 
nominal  or  ordinal 

continuous, 
categorical,  flag, 
nominal  or  ordinal 

continuous, 
categorical,  flag, 
nominal  or  ordinal 
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Target  fields 

continuous,  categorical, 
flag,  nominal  or  ordinal 

categorical,  flag  or 
nominal 

continuous, 
categorical,  flag, 
nominal  or  ordinal 

flag,  nominal  or 
ordinal 

Type  of  split 

binary  splits 

binary  splits 

more  than  two 
branches  at  a  time 

more  than  two 
branches  at  a  time 

Method  used 
for  splitting 

For  categorical  output,  a 
dispersion  measure  is 
used  (by  default  the  Gini 
coefficient).  For 
continuous  targets,  the 
least  squared  deviation 
method  is  used 

chi-square  test  for 
categorical 
predictors,  and 
analysis  of  variance 
for  continuous 
inputs 

chi-square  test 

information  theory 
measure  is  used,  the 
information  gain 
ratio 

Missing  value 
handling 

use  substitute  prediction 
fields,  where  needed, 
to  advance  a  record  with 
missing  values  through  the 
tree  during  training 

use  substitute 
prediction  fields, 
where  needed, 
to  advance  a  record 
with  missing  values 
through  the  tree 
during  training 

makes  the  missing 
values  a  separate 
category  and 
enables  them  to  be 
used  in  tree 
building. 

uses  a  fractioning 
method,  which 
passes  a  fractional 
part  of  a  record 
down  each  branch 
of  the  tree  from  a 
node  where  the  split 
is 

based  on  a  field 
with  a  missing 
value. 

Pruning 

offer  the  option  to  grow 
the  tree  fully  and  then 
prune  it  back  by 
removing  bottom-level 
splits  that  do  not 
contribute  significantly  to 
the  accuracy  of  the  tree 

/ 

offer  the  option  to 
grow  the  tree  fully 
and  then  prune  it 
back  by  removing 
bottom-level  splits 
that  do  not 
contribute 
significantly  to  the 
accuracy  of  the  tree 

offer  the  option  to 
grow  the  tree  fully 
and  then  prune  it 
back  by  removing 
bottom-level  splits 
that  do  not 
contribute 
significantly  to  the 
accuracy  of  the  tree 

Interactive 
tree  building 

provide  an  option  to 
launch  an  interactive 
session. 

provide  an  option  to 
launch  an 
interactive  session. 

provide  an  option  to 
launch  an 
interactive  session. 

No  this  option 

Prior 

probabilities 

support  the  specification 
of  prior  probabilities  for 
categories  when 
predicting  a  categorical 
target  field. 

support  the 
specification  of 
prior  probabilities 
for  categories  when 
predicting  a 
categorical  target 
field. 

do  not  support 
specifying  prior 
probabilities 

do  not  support 
specifying  prior 
probabilities 

Rule  sets 
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In  this  project,  we  will  use  re-constructed  population  to  run  decision  tree  module  with  CHAID. 
“Group”  and  “gender”  are  independent  variables,  and  “indicator”  is  dependent  variable.  The  output  of 
decision  tree  are  probabilities  of  responder  and  non-responder  in  each  node  of  tree. 

Then,  we  use  “indicator”  as  a  dependent  variable  in  a  logistic  regression  with  probability  for 
responses,  which  is  output  from  decision  tree,  as  independent  variables.  We  set  the  regression  program 
to  save  the  predicted  probability  (pprob)  from  the  regression  for  each  node  and  include  it  in  the  dataset. 

Next,  compute:  weight= — - — ^ — — — — . 

predicted  probability 

The  last  step  is  optional.  Re-scale  the  weights,  which  at  this  stage  generally  are  above  1.0,  are 
usually  rescaled  so  as  to  add  to  the  responding  sample  numbers. 

4.  Python 

Python  is  a  widely  used  high-level  programming  language  for  general-purpose  programming, 
created  by  Guido  van  Rossum  and  first  released  in  1991.  An  interpreted  language,  Python  has  a  design 
philosophy  which  emphasizes  code  readability  (notably  using  whitespace  indentation  to  delimit  code 
blocks  rather  than  curly  brackets  or  keywords),  and  a  syntax  which  allows  programmers  to  express 
concepts  in  fewer  lines  of  code  than  might  be  used  in  languages  such  as  C++  or  Java.  The  language 
provides  constructs  intended  to  enable  writing  clear  programs  on  both  a  small  and  large  scale. 

Python  features  a  dynamic  type  system  and  automatic  memory  management  and  supports  multiple 
programming  paradigms,  including  object-oriented,  imperative,  functional  programming,  and  procedural 
styles.  It  has  a  large  and  comprehensive  standard  library.  Python  interpreters  are  available  for  many 
operating  systems,  allowing  Python  code  to  run  on  a  wide  variety  of  systems [17,  25]. 

The  IBM  SPSS  Statistics  -  Integration  Plug-in  for  Python  provides  two  interfaces  for  programming 
with  the  Python  language  within  IBM  SPSS  Statistics  on  Windows,  Linux,  Mac  OS,  and  for  IBM  SPSS 


Statistics  Server[ll], 


Re-construction  and  Weighting 


19 


Python  Integration  Package:  The  Python  Integration  Package  provides  functions  that  operate  on 
the  IBM  SPSS  Statistics  processor,  extending  IBM  SPSS  Statistics  command  syntax  with  the  full 
capabilities  of  the  Python  programming  language.  With  this  interface,  we  can  access  IBM  SPSS 
Statistics  variable  dictionary  information,  case  data,  and  procedure  output.  We  can  submit  command 
syntax  to  IBM  SPSS  Statistics  for  processing,  create  new  variables  and  new  cases  in  the  active  dataset, 
or  create  new  datasets.  We  can  also  create  output  in  the  form  of  pivot  tables  and  text  blocks,  all  from 
within  Python  code. 

Scripting  Facility:  The  Scripting  Facility  provides  Python  functions  thaj/operate  on  user  interface 
and  output  objects.  With  this  interface,  you  can  customize  pivot  tables,  and  export  items  such  as  charts 

and  tables  in  various  formats.  We  can  also  start  IBM  SPSS  Statistics  dialog  boxes,  and  manage 

/ 

connections  to  instances  of  IBM  SPSS  Statistics  Server,  all  from  within  Python  code. 

In  this  project,  the  only  available  software  is  SPSS.  Thus,  Integration  Plug-in  for  Python  is  the  tool 
for  us  to  manipulate  data. 

5.  Design  and  Analysis  of  Algorithm 

In  sections,  we  have  analyzed  the  features  of  dataset  of  this  project:  large  scale,  low  responses  rate, 
lack  of  reference  population.  Thus,  we  need  to  weight  for  non-responses.  In  order  for  weighting,  we 
need  re-construct  reference  population.  The  design  of  the  project  is  as  follows: 

First  step:  design  and  realize  an  algorithm  by  Python  to  re-construct  the  population. 

Second  step:  Two  methods  were  adopted  to  weight  non-responses: 

>  The  first  method  is  to  compute  Post-Stratification  weights  by  matching  crosstabs. 

>  The  second  is  to  use  Logistic  Regression  approach  to  weight.  SPSS  decision  tree  with 
CHAID  module  has  been  used  to  compute  the  probabilities  of  predict  factors. 
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Last  step:  we  compare  weights  from  different  methods,  and  further  suggestion  and  discussion  will 
be  conducted. 


The  flow  and  Components  of  Project  is  as  follows: 


Figure  1:  Flow  and  Components  of  Project 
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Begin  process  of 


Generate  4569  separate  dataset  files.  One  file  for  one  unit. 


Fo  r  e  ach  file/uin  it,  gene  rat  e  'fake 1  rows  f  o  r  n  o  n-resp  o  n  ses  as  f  o  1 1  ow  s: 

*  do  untthe  number  of  re  sp  on  ses  i  n  each  ee  II  of 'rankxgender'  cro  sst  ab 

*  calculate  the  difference  of  each  cellof 'ran  kx  gender 'crosstab 
between  reference  population  and  survey 

»  b  ase  d  o  n  ab  ove  d  iff  e  re  nee,  ge  nerate  'fake 1  rows  fo  r  no  n-resp  o  nses, 

each  of  which  has  demographic-characteristics- Rank  and  Gender 

r  7 

Con  cat  e  n  at  e  4  56  9  se  p  arat  e  d  at  aset  f i  I  e  s  into  one  data  fi  fe . 

1 —  “ — 

Mew  Dataset  with  responses  and  n on-responses,  which  will  be 
used  as  reference  population 

T 


T 


post-st  ratification 
weighting  by 
crosstab  match 


post-stratification 
weighting  by 
logistic  regression 
and  decision  tree 


i 


»  U  se  w  h  o  I  e  r  eference  popul  at  i  o  n,  co  m  p  ute 

#  ofv&rsons  in  itficell  in  population 

population  proportional)  = - - - - — - - 

popu/GtEC?! 

»  U  se  u  n  i t  r efe  re  n  ce  p  o  p  u  I  ati  o  n,  co  m  p  ute 

#  of  persons  in  itricell  c'n  miit 


wait  population  proportional)  =  - 


■population  in  this  unit 


Use  unit  r e sp o n ses, co m p ute 


u?\it  sampl  e  prop  orti  ob(Q  = 


#  □/  responses  in  itfi  cell  m  unit 
total  #  □/  responses  in  unit 


Compute 
weightt(i)  = 

weigh^^f)  = 


coclluz  f ion  c  roc  ortioR  I.  D 
si i  pic  is  c  r  oc  o  r  t  urn  t 0  1 

LLH Lt  COCLiLC tLOH  C TOC ORf  LOR (I) 


sample  croc  orf lor  (lj 

waiflhtii}  =  weight  ±{0  *  w  eight  2(f) 


Use  whole  populationto  run  decision  tree 
module  with  CH AID:  rank  and  gender 
independent  variables,  indicator  dependent 
variable. 

U  se  "indicator1'  as  a  dependent  variable  in  a 
logistic  regression  with  probabilityfor 
r  e  sp  o  n  ses,  w  h  ich  is  o  ut  p  ut  fro  m  decisi  o  n 
tree,  as  independent  variables.  Set  the 
regression  program  to  save  the  predicted 
probability  fro  m  t  he  regre  ssi  on  f  o  r 

each  node  and  include  it  in  the  dataset. 
Compute:  weight  =  — — — — ^ - - — 

probability  of  reapofadinig 

Re-scal  e  t  h  e  w  e  ight  s,  which  at  t  h  is  st  age 
generally  are  above  1.0,  are  usually 
r e  seal  e  d  so  as  t  o  ad  d  t  o  t  h  e  respo  nd  i  ng 
sample  numbers. 


r 


Use  t-test  to  compare  weights 


The  algorithm  for  re-construction  can  be  broken  into  five  components  as  follows: 
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Table  6:  Components  for  Re-construction  and  Computing  Weights 


Components 


Operation 

Time 


n 

o 

3 

rs 

© 

s 

n 

S 


Purpose 


Input 


Procedure 


Output 


Comments 


Generate  unit  dataset  files. 

Compute  and  add  all  non-responder  cases  for  each  unit  dataset  file. 


The  original  dataset  file  with  new  variable  “indicator”. 


Iterate  on  units: 

1.  generate  one  unit  dataset  file 

2.  compute  the  missing  responses  for  each  cell  of  “group  and  gender’ 
crosstab,  generate  non-responses  rows,  and  attach  all  non¬ 
responder  cases  to  this  unit  dataset  file 

3.  output  this  unit  dataset  file,  and  then  close  it. 


Separate  4500+  dataset  files.  One  file  for  one  unit. 


1. 


2. 


3. 


4. 


5. 


I  have  added  a  new  variable  “indicator”  to  original  dataset.  The 
name  of  new  dataset  file  is  “Alldatawithlndicator.sav”. 

After  opening  original  data  file,  it  is  necessary  to  compute  another 
new  variable  $CASENUM,  and  set  it  as  last  variable. 

Indicator’s  values  are  1,  which  means  all  cases  are  responders.  Its 
value  for  non-response  is  0. 

If  reference  population  is  missing,  then  no  rows  for  non-response 
will  be  added. 

Error  1:  In  the  crosstab,  the  value  of  each  cell  of  reference 
population  is  supposed  to  be  greater  or  equal  to  the  corresponding 
value  of  responses.  However,  some  responses  may  input  wrong 
information,  then  this  algorithm  will  generate  extra  rows.  For 
example,  this  algorithm  will  generate  one  extra  non-response  row 


for  fol 


owing  unit: 


Population 
Number  of  rows 

Sample 

Number  of  rows 

Total:  10 

Total:  10 

gender 

gender 

Rank  1 

4 

3 

5 

2 

Rank  2 

1 

2 

1 

2 

6. 


Error  2:  missing  value  from  “group”  or  “gender”.  Someone  submits 
his  survey,  but  he  does  not  fill  out  gender  or  rank.  For  example,  in 
following  survey  there  are  10  responses,  but  one  person  does  not 
input  his  rank,  so  by  computing,  it  misses  one  person,  then 


Population 

Total  Number  of 

Sample 

Total  Number  of 

rows:  10 

rows:  10 

gender 

gender 

Rank  1 

4 

3 

4 

2 

Rank  2 

1 

2 

1 

2 

Above  two  types  of  error  will  lead  to  extra  cases,  which  will  affect 
following  analysis  including  decision  tree  and  computing  weights. 


20-24 

hours 
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Component  2 

Purpose 

automatically  concatenate  files 

1  hour 

Input 

The  unit  dataset  files  (4500+) 

Procedure 

Iterate  on  files: 

1.  count  50  files 

2.  use  “ADD  FILES”  to  concatenate  every  50  files 
concatenate  the  leftover  files 

Output 

One  dataset  file  “allcombined.sav”,  which  includes  all  responses  and 
non-responses. 

Comments 

The  maximum  number  of  files  that  SPSS  -“ADD  FILES”  command  can 
take  is  50,  so  we  have  to  set  up  loop  to  concatenate  50  files  at  one  time. 

Component  3 

Purpose 

Generate  decision  tree,  and  then  run  logistic  regress  to  get  predict 
probability 

Less  than 

10  minutes 

Input 

The  input  of  decision  tree  is  one  dataset  file  “allcombined.sav”  from 
Component  2. 

The  input  of  regression  is  the  probability  of  responses  from  Decision 

Tree 

Procedure 

Run  SPSS-Tree  first,  and  run  SPSS  -  LOGISTIC  REGRESSION 

Output 

Decision  tree:  “0utput_07012017_l_treeoutput.spv” 

Predicted  probability  for  responders:  "NodeIDbyProb_07032017.sav", 
"allcombined_07012017_Treel.sav" 

Comments 

The  input  of  Decision  Tree  module  is  the  whole  population  dataset  file. 
Output  is:  decision  tree,  and  a  new  variable  “predicted  probability”. 

Component  4-1 

Purpose 

Compute  weights  by  using  unit  reference  population 

About  25 
minutes 

Input 

One  dataset  file:  “alldatawithoutindicator.sav”.  This  is  in  fact  the 
original  dataset. 

Procedure 

Iterate  on  units: 

1.  Count  the  number  of  responses  in  each  cell  of  group  x  gender 
crosstab. 

^  ^  ,  . .  Number  of  responses  in  each  cell 

2.  Compute:  respondersratio  = - - - : — - 

T otal  responses  in  this  unit 

3.  Compute:  referenceratio 

Number  in  each  cell  of  unit  reference  population 

Population  of  this  unit 
.  ^  ,  .  referenceratio 

4.  Compute:  weight = - - - — 

respondersratio 

Output 

One  dataset  file  “weights_%(name)s.sav”. 

Comments 

1.  Please  make  sure  that  the  variable  $casenum  is  the  last  variable  since 

I  will  use  it  to  generate  the  dictionary  for  all  units. 

2.  The  output  of  this  syntax  is  a  dataset  file  with  “UnitNumber,  UnitID, 
and  18  weights  for  those  18  cells  in  crosstab” 

3.  In  fact  this  component  can  be  imbedded  in  Component  1,  but  because 
of  memory  problem,  I  have  to  separate. 

Compo 

Purpose 

Compute  weights  by  using  whole  reference  population 

About  25 
minutes 

Input 

One  dataset  file:  “alldatawithoutindicator.sav”.  This  is  in  fact  the 
original  dataset. 
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Procedure 

iterate  on  units: 

1.  Count  the  number  of  responses  in  each  cell  of  group  x  gender 
crosstab. 

-  ^  ,  Number  of  responses  in  each  cell 

2.  Compute:  respondersratio  = — - — —=■ - : — — - ; — 

Total  responses  in  this  unit 

~  ^  . .  Number  in  each  cell  of  whole  reference  population 

3.  Compute:  rration= - — - - ~r—m - 

The  whole  population 

Compute:  weight= - rrat'°’' - 

respondersratio 

Output 

One  dataset  file  “Myweights0703201701_%(name)s.sav” 

Comments 

1.  Please  make  sure  that  the  variable  $casenum  is  the  last  variable  since 

I  will  use  it  to  generate  the  dictionary  for  all  units. 

2.  The  output  of  this  syntax  is  a  dataset  file  with  “UnitNumber,  UnitID, 
and  18  weights  for  those  18  cells  in  crosstab” 

3.  In  fact  this  component  can  be  imbedded  in  Component  1,  but  because 
of  memory  problem,  I  have  to  separate. 

4.  The  variables  “rratioll”,  etc.  are  ratio  =  population  proportion  of  cell 
ll=(total  number  of  “rank  1  gender  l”)/whole  population.  However,  I 
do  not  need  to  compute  this  ration  sincel  just  read  it  from  the  outcome 
of  decision  tree. 

Component  5-1 

Purpose 

Attach  the  unit-based  weights  to  each  case  in  the  original  dataset  file 

16  hours 

Input 

Original  dataset  file:  “originaldatawithweightsr.sav”.  This  is  in  fact  the 
original  dataset  file. 

“weightsxDatasetlunitpop.sav”.  It  is  the  weights  based  on  unit 
reference  population.  It  is  the  output  from  Component  4-1. 

Procedure 

Iterate  on  cases: 

1.  Compute  rank  and  gender  value 

2.  Compute  a  new  variable  “localweights” 

Output 

One  dataset  file:  originaldatawithlocalweights.sav 

Comments 

The  component  can  be  imbedded  in  Components  5-1, 5-2,  5-3,  but 
because  of  memory,  I  separate  it. 

Component  5-2 

Purpose 

Attach  the  global  weights  into  each  case. 

13  hours 

Input 

“originaldatawithlocalweights.sav”.  This  is  the  output  dataset  file  from 
Component  5-1 

“Myweights0703201701_xDatasetl_wholepop.sav”.  This  is  the  output 
dataset  file  from  Component  4-2 

Procedure 

Iterate  on  cases: 

1.  Compute  rank  and  gender  value 

2.  Compute  a  new  variable  “globalweights” 

Output 

One  dataset  file:  originaldatawithlocalandglobalweights.sav 

Comments 

The  component  can  be  imbedded  in  Components  5-1, 5-2,  5-3,  but 
because  of  memory,  I  separate  it. 

Compone 

Purpose 

Attach  the  decision  tree  weights  into  each  case. 

15  hours 

Input 

“originaldatawithlocalandglobalweights.sav”.  This  is  in  fact  the  output 
dataset  file  from  Component  5-2 
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“NodeIDbyProb_07032017.sav”.  This  is  the  output  dataset  file  from 
Component  3.  It  is  the  weights  from  decision  tree. 

Procedure 

Iterate  on  cases: 

1.  Compute  rank  and  gender  value 

2.  Compute  a  new  variable  “decisiontreeweights” 

One  dataset  file: 

originaldatawithlocalandglobalanddecisiontreeweights.sav 

Comments 

The  component  can  be  imbedded  in  Component  1,  but  because  of 
memory,  I  separate  it. 

The  Python  codes  for  above  algorithms  of  each  component  are  at  appendix.  The  challenging  and  time- 

consuming  component  is  Component  1.  The  design  of  algorithm  is  as  follow: 

Step  1  :  Read  in  the  whole  dataset  and  compute  a  new  variable  casenum: 

GET  FILE= 'E:\WEI  WAN\My  SPSS\TestFUes'vAlldatawithIndicator.sav'. 

DATASET  NAME  alldata. 

SORT  CASES  BY  AFEOCAID. 
compute  caseu  =  $  CASENUM. 
formats  caseu(fl2.0). 

VARIABLE  LEVEL  caseu  (SCALE). 

EXECUTE. 


The  outcome  is  the  following  dataset  for  all  responses  in  memory. 


Step  2:  Generate  a  list  of  variable  names 
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5  ordlist=[] 

6  ordlist2=[] 

7  for  i  in  range ( spss. GetVa riableCountQ) : 

8  ordlist . append { spss . GetVa  riableName ( i ) ) 

9  ordlist2.append(spss.GetVariableType(i)) 

10 

11  totalvar  =  spss.  GetVa  riableCountQ 

The  outcome  will  be  two  lists.  One  is  the  list  of  all  variable  names,  the  other  is  the  list  of  all  variable 
type. 


[u'PopGrp1,  u ' AFEOCAID 1 ,  u'DEOCSID1,  u'DEOCSID7',  u'UICCODE1,  u 'MainUicCode ' ,  u 1  tit leof organization 1 ,  u ' commander email 1 ,  u ' commander name ' ,  u'star 
tdate 1 ,  u'enddate1,  u 1 reportdate 1 ,  u 1 submitdate 1 ,  u'ElE3M',  u'ElE3F',  u'E4E6M',  u'E4E6F',  u'E7E9M',  u'E7E9F', 

u 1 WOM 1 ,  u 1 WOF ' ,  u 1 0103M 1 ,  u'0l03F',  u'04AboveM',  u'04AboveF',  u'GSlGS8M',  u'GSlGS8F',  u'GS9_SESM',  u'GS9_SESF'/  u'OtherM',  u'OtherF1,  u1 SERVICE', 
u ' COMPONENT ' ,  u 1 accessCodelD 1 ,  u'gender1,  u'hispanic',  u'racel1,  u'race2 u'race3‘,  u'race4',  u'race5‘,  u 1 Ra 
ce ' ,  u'Maj_Min',  u' reside1,  u1 deployed1,  u'fedcat1,  u'paygrad',  u1 branch1,  u'type',  u'Ql',  u'Q2',  u'Q3',  u'Q4',  u'Q5‘,  u'Q6‘,  u'Q7',  u'Q8',  u'Q9' 
,  u'QlO1,  u 1 Qll 1 ,  u'Ql2',  u'Q13',  u'Ql4',  u'Ql5',  u'Ql6',  u'Ql7',  u 1 Q18 1 ,  u'Ql9',  u'Q20',  u'Q21',  u'Q22',  u'Q2 

3',  u 1 Q2 4 1 ,  u 1 Q2 5 ' ,  u'Q26',  u'Q27',  u'Q28',  u'Q29l,  u'Q30',  u'Q31',  u'Q32',  u'Q33',  u,Q34i,  u'Q35',  u'Q3  6',  u'Q37',  u'Q38',  u'Q39',  u'Q39A',  u 1 Q4 

O',  u 1 Q4 1 1 ,  u 1 Q42 1 ,  u 1 Q43 1 ,  u'Q44',  u'Q45',  u'Q46',  u'Q47',  u'Q48',  u'Q49',  u'Q50',  u'Q51',  u'Q52',  u'Q53',  u1 

Q54 ' ,  u'Q55',  u'Q56',  u'Q57',  u'Q58',  u'Q59‘,  u'Q60',  u'Q61',  u'Q62',  u'Q63',  u'Q64‘,  u'Q65',  u'Q66',  u'Q67',  u'Q68',  u'Q69',  u'Q70',  u'Q71',  u'Q 
72A 1 ,  u'Q72B',  u'Q72C',  u'Q72D',  u'Q72E',  u'Q72F',  u’Q72G',  u'Q72H',  u'Q72l',  u'Q72J',  u'Q73A',  u’Q73B',  u'Q73 

C',  u 1 Q73D 1 ,  u 1 Q73E ' ,  u'Q73F',  u'Q73G',  u'Q73H',  u'Q73l‘,  u'Q73J',  u'Q73K',  u'Q74',  u'Q75',  u'Q76‘,  u'Q77A',  u'Q77B',  u'Q77C',  u'Q77D',  u’Q77E', 
u'Q78',  u 1 Q78A 1 ,  u 1 Q79 1 ,  u'Q80',  u'Q81',  u'Q82',  u'Q83',  u'Q84',  u'Q85',  u'IsPaper1,  u'LogID',  u 1 Input_date 1 , 

u 1 Q74_l ' ,  u 1 Q74_2 ' ,  u 1 Q74_3 ' ,  u 1 Q74_4 1 ,  u 1 Q74_5 1 ,  u 1 Q74_6 1 ,  u 1 Q74_7 1 ,  u 1 Q74_8 1 ,  u 1 Q74_9 1 ,  u 1 Q74_10 1 ,  u ' Q79_l 1 ,  u 1 Q79_2 ' ,  u ' Q79_3 1 ,  u ' Q79_4 ' ,  u ' Q7 
9_5 1 ,  u 1 Q79_6 1 ,  u 1 Q79_7 1 ,  u'Q79_8',  u'Q79_9',  u'Q79_10',  u'Q79_lll,  u'Q81_l',  u'Q81_2',  u'Q81_3',  u'Q81_4',  u1 

Q8 1_5 1 ,  u'Q81_6',  u'Q81_7‘,  u'Q84_l',  u'Q84_2‘,  u'Q84_3‘,  u'Q84_4',  u'Q84_5',  u'Q84_6',  u'Q84_7',  u 'Month',  u' Quarter',  u'DoD',  u' Group',  u'Rank' 

,  u ' Rank Jr ' ,  u ' RankJrMil ' ,  u ' OFFvsENL ' ,  u ' MILvsC IV ' ,  u ' Organization ' ,  u ' ServComponent ' ,  u ' Br oadJr EnlGender ' ,  u 

' Jr EnlGender ' ,  u ' TypeTotal ' ,  u ' ActiveMilCiv ' ,  u ' ArmyVsAir ' ,  u ' INTENDSTAY ' ,  u ' FAVORITISM ' ,  u ' RQ19 ' ,  u ' RQ2 5 ' ,  u ' RQ3 2 ' ,  u ' RQ3 6 ' ,  u ' RQ40 ' ,  u ' RQ44 ' ,  u 
' RQ47 ' ,  u ' RQ5  6 ' ,  u ' RQ62 ' ,  u ' RQ65 ' ,  u ' Or gCom ' ,  u ' Tr ustLead ' ,  u ' Or gPer f ' ,  u ' OrcrCoh '  u ' LeadCoh 1 ,  u ' JobSat ' ,  u ' Di 

ixDisc  ' ,  u'RelDisc',  u'SexHar',  u' Racist',  u' Sexist',  u'AgeDisc' 


vMgt ' ,  u 1  OrgPr oc  ' ,  u '  HelpSeek ' ,  u  '  Exhaust ' ,  u '  Hazing  ' ,  u '  Demean  ' ,  u  '  Rac!  Double-click  to 
,  u'DisDisc',  u'Safetyl1,  u'Safety2',  u '  CoCSupportl ' ,  u '  CoCSupport2  ' ,  u  activate 


:3  ' ,  u  1  CoCSupport4  ' ,  u'CoCSup 


port5 ' ,  u ' Publicityl ' ,  u ' Publicity2 ' ,  u ' Publicity3 ' ,  u ' CoCSupport6 ' ,  u ' CoCSupport7 1 ,  u'URCl',  u'URC2',  u'URC3',  u'URC4',  u 1 URC5 ' ,  u'URC6',  u'URC7 
' ,  u ' URC8 ' ,  u ' URC9 ' ,  u ' URC 10 ' ,  u 1 Saf etyPer cep ' ,  u ' Saf eLiveUF ' ,  u ' Saf eWor kUF ' ,  u ' CoCSuppor t ' ,  u ' Publicity ' ,  u ' P 

ublicit yUF 1 ' ,  u ' PublicityUF2 ' ,  u ' PublicityUF3 ' ,  u ' URC7r ev ' ,  u ' URC9r ev ' ,  u ' URC lOr ev ' ,  u ' URC ' ,  u ' Bar r ier sTo tal ' ,  u ' Bar r ier sTr i ' ,  u ' Bystander 1 ' ,  u ' B 
ystander 1UF ' ,  u ' Bystander 2 ' ,  u ' Bystander Scale ' ,  u ' BystanderObsl 1 ,  u ' BystanderObs2 ' ,  u ' Knowledge 1 ' ,  u ' Knowledge 

2 ' ,  u ' Knowledge3 ' ,  u ' Knowledge4 ' ,  u ' Knowledge5 ' ,  u ' Knowledge Sc ale ' ,  u ' JrEnlvNCOvAll ' ,  u ' JrEnandNCO ' ,  u ' BystanderObs2Dich ' ,  u 'Metric4Composite ' ,  u 
' Me tric9CompositeR ' ,  u ' Metric9NEWComposite ' ,  u ' Metric 1 lComposite ' ,  u ' SenorityGender ' ,  u ' P as swords Re ques ted ' ,  u 
' Fir stDEOCS ' ,  u ' Occurrence ' ,  u ' ACTSERVICE ' ,  u ' WRank ' ,  u ' PopNum ' ,  u ' f ilter_$ ' ,  u ' TotalN ' ,  u ' indicator ' ,  u ' casen ' ] 

[0,  0,  9,  0,  25,  25,  27,  40,  19,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 
,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 

0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 
,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  25,  0,  0,  0,  0,  0,  0,  0,  0,  0,  25,  0,  25,  0,  0,  25,  0,  0,  25,  0,  0,  0,  0,  0,  0 
,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 
,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0] 

/ 

Step  3:  Change  from  Python  Unicode  to  string.  (Strings  received  by  Python  from  IBM  SPSS  Statistics 
are  converted  from  UTF-8  to  Python  Unicode,  which  is  UTF-16.) 


13  length  =  len(ordlist) 

14  newordlist=[] 

15  for  1  in  range(length) : 

16  newordlist .append( [ordlist [i] .encode( ‘ascii ' , ‘ignore 1 ), ordlist2[i]]) 


The  outcome  will  be  one  list  as  follows.  Each  element  of  this  list  is  a  pair:  (VariableName 
VariableType). 
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[['PopGrp',  0],  [ ' AFEOCAID ' ,  0],  ['DEOCSID1,  9],  [ 1 DEOCSID7 ' ,  0],  ['UICCODE1,  25],  [ ' MainUicCode ' ,  25],  [ 1 titleof organization ' ,  27],  [' commander e 
mail1,  40],  [ 1  commander name ' ,  19],  [ ' startdate ' ,  0],  ['enddate',  0],  [ ' reportdate ' ,  0],  [ ' submitdate ' ,  0],  ['E 

1E3M',  0],  [ ' E1E3F 1 ,  0],  ['E4E6M',  0],  [ 1 E4E6F 1 ,  0],  [ 1 E7E9M ' ,  0],  [ 1 E7E9F 1 ,  0],  ['WOM',  0],  ['WOF',  0],  ['0103M',  0],  ['0103F',  0],  ['04AboveM', 
0],  [ 1 04AboveF ' ,  0],  [ 1 GS1GS8M ' ,  0],  [ 1 GS1GS8F ' ,  0],  ['GS9_SESM',  0],  ['GS9_SESF',  0],  [ ' OtherM ' ,  0],  ['Other 
F',  0],  ['SERVICE',  0],  ['COMPONENT',  0],  [ ' accessCodelD ' ,  0],  ['gender',  0],  ['hispanic',  0],  ['racel',  0],  ['race2',  0],  ['race3',  0],  ['race4' 
,0],  ['race5',  0],  ['Race',  0],  ['Maj_Min',  0],  ['reside',  0],  ['deployed',  0],  ['fedcat',  0],  ['paygrad',  0] 

,  ['branch',  0],  ['type',  0],  ['Ql',  0],  [ ' Q2 ' ,  0],  [ ' Q3 ' ,  0],  [ ' Q4 ' ,  0],  [ ' Q5 ' ,  0],  ['Q6',  0],  ['Q7',  0],  ['Q8',  0],  [ ' Q9 ' ,  0],  ['Q10',  0],  ['Ql 

1',  0],  [ ' Q12  ' ,  0],  [ ' Q13  '  ,  0],  [ ' Q14  '  ,  0],  ['Q15',  0],  [ ' Q16  ' ,  0],  [ ' Q17  ' ,  0],  ['Q18',  0],  ['Q19',  0],  [ ' Q2  0  ' 

,  0],  ['Q21',  0],  [ ' Q2  2  ' ,  0],  [  'Q23  1 ,  0],  ['Q24',  0],  [  ' Q25  ' ,  0],  [ ' Q2  6  ' ,  0],  [  ' Q2  7  ' ,  0],  ['Q28',  0],  ['Q29',  0],  [ ' Q3  0  ' ,  0],  [ 'Q31 ' ,  0],  ['Q32', 

0],  [ ' Q3  3  ' ,  0],  [  ' Q3  4  ' ,  0],  [ ' Q3  5  ' ,  0],  [ ' Q3  6  ' ,  0],  ['Q37',  0],  ['Q38',  0],  ['Q39',  0],  ['Q39A',  0],  [ ' Q40  ' , 

0],  ['Q41',  0],  [  ' Q42  ' ,  0],  [ ' Q43  ' ,  0],  ['Q44',  0],  ['Q45',  0],  ['Q46',  0],  ['Q47',  0],  ['Q48',  0],  ['Q49',  0],  ['Q50',  0],  ['Q51',  0],  ['Q52',  0 
],  [ ' Q53  ' ,  0],  [  ' Q54  ' ,  0],  [ ' Q55  ' ,  0],  [ ' Q5 6  ' ,  0],  [ ' Q57  ' ,  0],  [ ' Q58  ' ,  0],  ['Q59',  0],  [ 1 Q60  ' ,  0],  [ ' Q61 ' ,  0], 

['062',  0],  ['063',  0],  ['064',  0],  ['065',  0],  ['066',  0],  ['067',  0],  ['068',  0],  ['069',  0],  ['070',  0],  [  'Q71 ' ,  0], 

,  [ ' Q72C ' ,  0],  [ ' Q72D ' ,  0],  ['Q72E',  0],  ['Q72F',  0],  ['Q72G',  0],  ['Q72H',  0],  ['0721',  0],  ['Q72J',  0],  [ ' Q7 
3A 1 ,  0],  [ ' Q73B ' ,  0],  ['Q73C',  0],  ['Q73D',  0],  ['Q73E',  0],  ['Q73F',  0],  ['Q73G',  0],  ['Q73H',  0],  [ ' Q73 I ' ,  0],  ['Q73J', 

,  25],  [ ' Q75 ' ,  0],  [ ' Q7 6 ' ,  0],  ['Q77A',  0],  ['Q77B',  0],  ['Q77C',  0],  ['Q77D',  0],  ['Q77E',  0],  [ ' Q78 ' ,  0],  [' 

Q78A',  0],  [ ' Q79 1 ,  25],  ['Q80',  0],  ['Q81',  25],  [ ' Q8 2 ' ,  0],  [ ' Q8 3 ' ,  0],  ['Q84',  25],  [ ' Q8 5 ' ,  0],  [ ' IsPaper ' ,  0],  ['LogID',  25],  [ ' Input_date ' , 

],  [ ' Q74_l ' ,  0],  [ ' Q74_2 ' ,  0],  ['Q74_3',  0],  ['Q74_4',  0],  [ ' Q74_5 ' ,  0],  [ ' Q74_6 ' ,  0],  ['Q74_7',  0],  [ ' Q748 ' , 

0],  [ ' Q74_9 ' ,  0],  [ ' Q74_10 ' ,  0],  ['Q79_l',  0],  [ ' Q79_2 ' ,  0],  ['Q79_3',  0],  ['Q79_4',  0],  ['Q79_5',  0],  [ ' Q79_6 ' , 

[ 1 Q79_9 1 ,  0],  [ ' Q79_10 1 ,  0],  ['Q79_ll',  0],  [ ' Q8 1_1 ' ,  0],  [ ' Q8 1_2  ' ,  0],  [ ' Q8 1_3 ' ,  0],  [ ' Q8 1_4 ' ,  0],  [ ' Q8 1_5 ' , 

0],  [ ' Q8 1_6 ' ,  0],  [ ' Q8 1_7 ' ,  0],  [ ' Q8 4_1 ' ,  0],  [ ' Q8 4_2 ' ,  0],  ['Q84_3',  0],  ['Q84_4',  0],  ['Q84_5',  0],  [ ' Q84_6 ' , 

['Quarter',  0],  ['DoD',  0],  ['Group',  0],  ['Rank',  0],  ['RankJr',  0],  [ ' Rank JrMil ' ,  0],  [ ' OFFvsENL ' ,  0],  ['MIL 
vsCIV',  0],  ['Organization',  0],  [ ' ServComponent 1 ,  0],  [ ' BroadJrEnlGender  ' ,  0],  [ ' JrEnlGender ' ,  0],  [ ' TypeTotal ' , 

Vs Air ' ,  0],  [ ' INTENDSTAY ' ,  0],  ['FAVORITISM',  0],  ['RQ19',  0],  ['RQ25',  0],  ['RQ32',  0],  ['RQ36',  0],  [ ' RQ40 ' , 

0],  [ ' RQ44 ' ,  0],  [ ' RQ47 ' ,  0],  ['RQ56',  0],  ['RQ62',  0],  ['RQ65',  0],  ['OrgCom',  0],  [ ' TrustLead ' ,  0],  ['OrgPerf' 

0],  ['JobSat',  0],  ['DivMgt1,  0],  ['OrgProc',  0],  ['HelpSeek1,  0],  ['Exhaust',  0],  ['Hazing',  0],  ['Demean', 

0],  [ ' RacDisc ' ,  0],  ['SexDisc',  0],  ['RelDisc',  0],  ['SexHar',  0],  ['Racist',  0],  ['Sexist',  0],  ['AgeDisc',  0],  ['DisDisc',  0],  ['Safetyl',  0], 
['Safety2',  0],  [ ' CoCSupportl 1 ,  0],  [ ' CoCSupport2  1 ,  0],  [ ' CoCSupport3  ' ,  0],  [ ' CoCSupport4  ' ,  0],  [ ' CoCSupport5  ' 

,0],  [ ' Publicityl ' ,  0],  [ ' Publicity2 ' ,  0],  [ ' Publicity3 1 ,  0],  [ 'CoCSupport6  ' ,  0],  [ 'CoCSupport7  ' ,  0],  ['URCl', 

RC4 '  ,  0],  [ ' URC5 ' ,  0],  [ ' URC 6 ' ,  0],  [ ' URC7 ' ,  0],  [ ' URC8 ' ,  0],  [ ' URC9 ' ,  0],  ['URC10',  0],  [ ' Saf etyPercep  '  ,  0], 

[ ' Saf eLiveUF ' ,  0],  [ ' Saf eWorkUF ' ,  0],  [ ' CoCSupport ' ,  0],  ['Publicity1,  0],  [ 1 PublicityUFl 1 ,  0],  [ ' PublicityUF2 ' 
ev',  0],  [ ' URC9rev ' ,  0],  ['URClOrev',  0],  ['URC',  0],  [ ' BarriersTotal ' ,  0],  [ 'BarriersTri' ,  0],  [ 'Bystanderl 1 , 

0],  [' Bystander 1UF ' ,  0],  [ 1  Bystander 2 ' ,  0],  [' Bystander Scale ' ,  0],  [ ' BystanderObsl ' ,  0],  [' Bystander Obs2 ' ,  0], 

0] ,  [ ' Knowledge 3 ' ,  0] ,  [ ' Knowledge4 ' ,  0] ,  [ ' Knowledge 5 ' ,  0] ,  [ ' Knowledge Sc ale ' ,  0] ,  [ ' JrEnlvNCOvAll ' ,  0] ,  [ 1 J 
rEnandNCO',  0],  [ ' BystanderObs2Dich ' ,  0],  [ 'Metric4Composite ' ,  0],  [ 'Metric9CoropositeR ' ,  0],  [ 'Metric9NEWComposite ' ,  0],  [ 'Metricti 
,  [ ' SenorityGender ' ,  0],  [ ' PasswordsRequested ' ,  0],  [ ' FirstDEOCS ' ,  0],  ['Occurrence',  0],  [ ' ACTSERVICE ' ,  0],  [ 

'WRank',  0],  ['PopNum',  0],  ['filter_$',  0],  ['TotalN',  0],  ['indicator',  0],  ['casen',  0]] 


[ ' Q72A ' ,  0],  [ ' Q72B ' ,  0] 


0],  [ ' Q73K ' ,  0],  [ ' Q74 ' 


0],  [ ' Q79_7 ' ,  0],  [ ' 0798 ' ,  0], 


0],  [ ' Q8 4_7 ' ,  0],  ['Month',  0], 


0],  [ 'ActiveMilCiv' ,  0],  ['Army 


0],  ['OrgCoh',  0],  ['LeadCoh\ 


0],  [ ' URC2 ' ,  0],  [ ' URC3 ' ,  0],  ['U 


0],  [ ' PublicityUF3 ' ,  0],  ['URC7r 


[ ' Knowledge 1 ' ,  0 ] 


Double-click  to 
activate 


dge2  ' , 


llComposite ' ,  0] 


Step  4:  Generate  a  dictionary  {unit  number:  [unitID,  begincasenum,  endcasenum] }. 


with  spss.DataStepQ  starts  a  block  to  manipulate  data. 


21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 


with  spss.DataStep() : 

ds  =  spss.DatasetQ  # — create  a  pointer  to  point  at  current  active  dataset -- 

i  =1 

dl=dict() 

!  #-i:  unit  number-- j  =  [deptid,  beginingcasenum^  endingcasenum]- - 

j=[ds.cases[0,l] [0],  ds. cases [0,totalvar-l] [0],ds.cases[0,totalvar-l] [0]] 
dl={i* j} 
iteration  =  0 
for  r  in  ds. cases: 

if  <r[l]!-dl[i][0]): 

dl[i] [2]=dl[i] [1]+  iteration-1 
i=i+l 

newj=[r[l],  dl[i-l] [2]+l,  dl[i-l] [2]+l] 
dl[i]-newj 
iteration=0 
it e  rat ion=it e  rat ion+1 
dl[i] [2]-ds. cases [-l,totalvar-l] [0] 


Outcome  will  be  as  follows 
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:  {1:  [ 12  993 3.0,  1.0,  62.0],  2:  [135253.0,  63.0,  73.0],  3:  [161333.0,  79.0,  99.0],  4:  ; 

:  [161339.0,  100.0,  174.0],  5:  [161730.0,  175.0,  195.0],  6:  [162772.0,  196.0,  215.0],  ! 

;  7 :  [165204.0,  216.0,  260.0],  8:  [166030.0,  261.0,  234.0],  9:  [166370.0,  235.0,  304.0; 

;  ],  j 

;  10:  [166757.0,  305.0,  418.0],  11:  [167247.0,  419.0,  435.0],  12:  [167560.0,  436.0,  46; 

!2.0],  13:  [167713.0,  463.0,  553.0],  14:  [167716.0,  554.0,  604.0],  15:  [167938.0,  6051 

;  .0,  624.0],  16:  [163340.0,  625.0,  643.0],  17:  [163341.0,  644.0,  659.0],  13:  [163372.; 

:  o,  : 

;  660.0,  685.0],  19:  [169151.0,  686.0,  748.0],  20:  [169154.0,  749.0,  779.0],  21:  [1691; 

:55.0,  730.0,  794.0],  22:  [169834.0,  795.0,  310.0],  23:  [170115.0,  311.0,  345.0],  24:1 

;  [170175.0,  346.0,  363.0],  25:  [171053.0,  364.0,  904.0],  26:  [171092.0,  905.0,  913.0; 

: ],  : 

:  2 7 :  [171345.0,  919.0,  950.0],  23:  [171533.0,  951.0,  970.0],  29:  [171908.0,  971.0,  98; 

iS.O],  30:  [171940.0,  989.0,  1010.0],  31:  [172149.0,  1011.0,  1034.0],  32:  [172887.0,  I 

;  1035.0,  1118.0],  33:  [172920.0,  1119.0,  1167.0],  34:  [172930.0,  1163.0,  1133.0],  35:; 

Step  5  :  Iterate  on  units:  generate  a  dataset  for  an  unit,  compute  each  cell  in  group  X  gender  crosstab, 

/ 

and  compute  the  difference  between  responses  and  reference  population,  then  generate  non-responses 
rows,  and  at  last  output  this  file. 

41  f or  key  in  dl  : 

42  with  spss.  DataStep(  )  : 

43  dsl  =  spss.  Dataset(name="alldata") 

44  #  Create  a  new  dataset  for  each  unit - 

newdsl  =  spss.  Dataset  (  name=IH one) 

46  #  Add  variables  to  this  new  dataset - 

47  for  i  in  range{  length)  : 

48  newdsl .  v  ar  1  i  st .  appeod{  newor  dl  i  st  [  i  ]  [  0]  newordl  i  st  [  i  ]  [  1  ]|) 

40 

dept  id  =  dl  j[  key  ]|  |[  0  J 

51  dsH ames  =  -(newdsl .  name  :  deptid]- 

begi  nc  asenum=  1  nt  (  dl  [  key  ]|  ][  1  ]  )  - 1 
endcasenum=int( dl[ key ][ 2 ] ) 

54  #worki  ng  on  here - . 

55  templist  =  [0]  +  18 

vllJvl2Jv21Jv22Jv31  ^32^41^42^51^52^61^62  ,v71  ,  v72  ,  v/81  *v82  ,vl01 , vl©2=t emplist 
for  rov,  in  dsl .  cases[  begincasenimi  :  endcasenum]  : 

58  #  add  this  case  to  the  new  file - 

newdsl . cases. append( row ) 

60  #  count  the  number  of  each  cell  in  Group  and  gender  crosstab - 


6.1 

62  | 

if 

row[ dsl . warlist[ 'Group ' ] . index] 
vll  =  vll+1 

=  = 

1 

and 

row[ dsl . v arl i st [ " gender ' 

. index ]= 

=  1 

63  ;!| 

64 

if 

row[ dsl . varlist [ "Group ' ] . index] 
vl2  =  vl2+l 

=  = 

1 

and 

row[ dsl . v  arl i st [  " gender " 

. index ]= 

=  2 

65 

66  !  |  !  ! 

if 

row[ dsl . v  arl i st [  "Group " ] . index] 
v21  =  V21-lrl 

=  = 

2 

and 

row[ dsl . varlist[ " gender " 

. index ]= 

=  1 

67 

68 

if 

rott[  dsl .  varlist[  "Group  "  ]  .  index] 
v22  =  v  2  2+1 

=  = 

2 

and 

row[ dsl . v arl i st [ " gender " 

. index ]= 

=  2 

60 

70 

|  if 

row[ dsl . warlist[ "Group " ] . index] 
v31  =  v31+l 

=  = 

3 

and 

row[ dsl . v  arl i st [  " gender " 

. index ]= 

=  1 

71 

72  M  M 

if 

row|[  dsl .  v arl i st [  "Group  '  ]  .  index] 
v  32  =  v32+l 

=  = 

3 

and 

row:[  dsl.  v  arl  i  st  [  '  gender  ' 

. index ]= 

=  2 

73 

74 

if 

rcn'f[  dsl .  v arl i st [  "Group  "  ]  .  index] 
v41  =  w41+l 

=  = 

4 

and 

row[ dsl . varlist[ " gender " 

. index ]= 

=  1 

75 

76  |[|| 

if 

row[  dsl .  warlist[  'Group  '  ]  .  index] 
v42  =  V42+1 

=  = 

4 

and 

rcn'4  dsl .  v  arl  i  st  [  "  gender  ' 

. index ]= 

=  2 

77  ill! 

78  MM 

1  if 

row[ dsl . varlist[ "Group " ] . index] 
v  51  =  v51+l 

=  = 

5 

and 

row[  dsl .  varlist[  "  gender  " 

. index ]= 

=  1 

70 

80 

if 

rcn'f[  dsl .  v arl i st [  "Group  "  ]  .  index] 
v  52  =  v  52+1 

=  = 

5 

and 

row[ dsl . varlist[ " gender " 

. index ]= 

=  2 

81 

82 

if 

rotf[  dsl .  varlist[  "Group  "  ]  .  index] 
v  61  =  v  61+1 

=  = 

6 

and 

rm[  dsl .  varlist[  "  gender  " 

. index ]= 

=  1 

S3  1  j  j 

84 

|  if 

rotu[  dsl .  warlist[  "Group  "  ]  .  index] 
v/62  =  v  62+1 

=  = 

6 

and 

roif[  dsl .  v  arl  i  st  [  "  gender  " 

. index ]= 

=  2 

85  ! 

86  M  M 

if 

row[ dsl . v  arl i st [  "Group ' ] . index] 
v71  =  V71+1 

=  = 

7 

and 

row:[  dsl.  v  arl  i  st  [  '  gender  ' 

. index ]= 

=  1 

87 

88 

if 

row[  dsl .  varlist[  "Group  "  ]  .  index] 
v72  =  V72+1 

=  = 

7 

and 

row[ dsl . varlist[ " gender " 

. index ]= 

=  2 

80 

0© 

if 

row[  dsl .  varlist[  'Group  '  ]  .  index] 
v81  =  wSl+1 

=  = 

8 

and 

rcn'4  dsl .  v  arl  i  st  [  "  gender  ' 

. index ]= 

=  1 
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91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
101 
102 

103 

104 

105 

106 

107 

108 

109 

110 
111 
112 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 


8  and  row[dsl. varlist[ ’gender' ]. index]= =2 : 

9  and  row[dsl. varlist[ ’gender' ]. index]==l : 

9  and  rcw[ dsl .  v  arl i  st  [  1  gender ' ] .  i ndex ] = = 2 : 

10  and  row[ dsl. varlist[ "gender ']. index]==l : 
1 0  and  row[ dsl . v  arl i st [ " gender ’ ] . i ndex ] = = 2 : 


i  f  row[  dsl .  v  arl  i  st  [  '  Group  ’  ] .  i  ndex  ] 
v82  =  V82+1 

i  f  raf[  dsl .  v  arl  i  st  [  ’  Group  ’  ] .  i  ndex  ] 
v81  =  v81+l 

i f  row [ dsl . v arl i st [ ' Group ’ ] . i ndex ] 
v82  =  v/82+1 

i f  row [ dsl . v arl i st [ ' Group ’ ] . i ndex ] 

Vl01  =  V 101+1 

i  f  rm[  dsl .  v  arl  i  st  [  ’  Group  ’  ] .  i  ndex  ] 
vl02  =  v 1 02+1 

print("Ihis  is  the  key:  ",  key) 

print( "The  number  of  cases  in  each  cell  of  crosstable  in  this  unit  is:/n",  vil„vl2,v21, 

v22,v31,v32,v41,v42,v51,v52,v61,v62,v71,v72,vBl,v82,vl01,vl02) 
total  =  vll+  v  1 2+v  2 l+v2  2+v  31+v  32+ v41+v42+v  51+v  52+v  61+v  62+v  71+v  72+v  81+v  82+v  1 01 +v  1 02 
print("The  summation  of  all  cells  in  above  crosstable:" ,  total) 

#  compute  the  difference  between  reference  population  and  reponses - 

if  (  (  isinstance(dsl. cases[begimcasenum,dsl. varlist[ 'E1E3FT  ]. index][0],  int)  )  or 

(  isinstance(dsl .  cases[begincasemm,dsl .varlist[ 'E1E3H' ] .  index][0] ,  float)  )  ) 
diffll=dsl. cases[begincasenum,dsl. varlist[ ’ E1E3H' ]. index][0]  -  vll 
diffl2=dsl.  cases[ begi ncasenun, dsl . varlist[ ’E1E3F ’ ]. index][0]  -  vl2 
diff21=dsl. cases[begincasenum,dsl.  varlist[ 1E4E6M’ ].  index][0]  -  v21 
diff22=dsl. cases[beginc asenum,  dsl. varlistjf "E4E6F ’ ]. index][0]  -  v22 
di f f 31 = dsl . c  ases[ begi  nc  asenum,  dsl . v  arl i st [ 1 E  7E  9M ' ] , i ndex ] [ 0]  -  v31 
di f f 32 = dsl .  c  ases[ b egi nc  as  enum, dsl .  v  arl i st [ 1 E  7E  9F ’ ] .  i ndex ] [ 0]  -  v32 
diff41=dsl. cases[ begi  nc asenum, dsl. v arl ist[ "WOM’ ]. index ][ 0]  -  v41 
diff42=dsl.  cases[  beg!  nc  asenum,  dsl.  v  arl  ist[ ’WOF ’].  index  ]|[0]  -  v42 
di f f 51 = dsl . c  ases[ begi  nc  as  enum,  dsl . v  arl i st [ 1 0103f  t  ’ ] . i ndex ] [  0]  -  v  51 
di f f 52 = dsl .  c  ases[ begi  nc  asenum, dsl . v  arl i st [ ' 0103F ’ ] . i ndex ] [  0]  -  v  52 
diff61=dsl.  cases[  begi  nc  asenum,  dsl.  var  list  [  '04AbovtfT  ].  index  ][0]  -  v61 
di  f  f  62 = dsl .  c  ases[  begi  nc  as  enum,  dsl .  v  arl  i  st  [  1 04Abov  eF  '  ] .  i  ndex  ]  [  0]  -  v  62 
diff71=dsl. cases[ begi nc asenum, dsl. v arl ist[ ’GSlGSSff ’ ] . index][ 0]  -  v71 
di f f 72 = dsl . c  ases[ begi  nc  asenum,  dsl . v  arl i st [ 1 GS1GS8F ' ] . i ndex ] [ 0]  -  v  72 
di f f 81 = dsl . c  ases[ begi  nc  as  enum, dsl . v  arl i st [ 1 G59_5E  SM ' ] . i ndex ] [  0]  -  v  81 
di f  ■ f 82 = dsl . c  ases[ begi nc  asenum, dsl . v  arl i st [ ’ G59_5E  5F ' ] . i ndex ] [0]  -  v  82 
diffl01=dsl. cases[ begi  nc asenum, dsl. varlist[ 'OtherM1 ]. index][0]  -  vl01 
di ff 102 =dsl. cases[beginc asenum,dsl. var list [ 'OtherF ' ] . index][ 0]  -  vl02 
di f f  1  i st  1  di f f  1 1 1 ,  ■difm  1 ,  ,diff 21 1 ,  ■di ff 22  1 ,  ^1131 1 ,  1  diff32  1 ,  1  diff41 1 ,  ,diff421, 
|'diff51 ' ,  ' dif f 52 ' , ' dif f 61 ' ,  'diffG2 ' , 'diff71 ' ,  'diff72 ’ , 'diffSl ’ ,  ' diffS2 ' , ' diffl91 ' , 


sampl  ec  ase=  dsl .  c  ases[  begi  nc  asenum  ] 
indexof  group  =  dsl „v arl ist[ 'Group' ]. index 
i mdexof gender  =  dsl . v  arl i st [ ’ gender ' ] . i ndex 

# generate  non-resonsese  rats - 

for  var  in  difflist: 

numl=var.  replace("diff" ,"") 
num2 = i  nt  (  numl ) 
gendernum  =  int(nurt2%10) 
groupnum  =  int((num2- gendernum)/ 10) 

valueofvar=  int(eval(var)) 
if  valueofvar  <  0: 


'  dif f  102  '  J 
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142  print("Tlhie  value  of  ’wrong'  difference  is:",  valueofvar) 

143  print  ("there  exist  error  at  unit : "  ,  dl[  key  ][(>]) 

144  continue 

145  if  valueofvar  ==  &: 

146  continue 

147  for  i  in  range( valueofvar) : 

14S  for  j  in  range(totalvar) : 

149  if  j  ==  indexof  gender: 

150  sampler  asef  j  ]  =  gender  nun 

151  elif  (j  >=49)  and  (j<=199): 

152  sampler  ase[ j ]  =  Hone 

153  elif  j  ==  indexof  group: 

154  samp  let  asef j ]  =  group num 

155  elif  (j  >=201)  and  (j<=305): 

156  sampler ase)[  j  ]  =  Hone 

157  elif  j  ==  (totalvar-2) : 

158  sampler  asef  j  ]  =  0 

159  elif  j  ==  (totalvar-1) : 

160  sampler asej[  j  ]  =  Hone 

161  newdsl . rases . append ( sampler ase) 

162 

16^  # - 

164  strdept  =  st  r(  i  nt  (  dept  id)) 

165  n  ame= 1 i st ( dsN  ames . key  s( ) ) [  0] 

166  spss. Submit (r . 

167  DATASET  ACTIVATE  %<name)s. 

168  SAVE  0UTFILE=’E:\UEI  UAH\My  SPSS\unitfile3\unit_%( strdept )s. sav ' - 

169  DATASET  CLOSE  %(name)s. 

170  .  SlocalsQ) 

171 

1 72  spss . Submit ( r . 

173  DATASET  ACTIVATE  all data. 

174  DATASET  CLOSE  ALL. 

175  .  %locals<)) 

176 

177  End  Program. 

/ 


The  outcome  is  as  follows: 


Qunit.l  80478 

Qunit_181109 

Qunit_182200 

Q  unit  _1 82631 

Qunit_183170 

V^unitJ  80479 

a  unit_l  S 1 1 1 0 

t^unitj  82267 

Qunit_l  82632 

^unit_183173 

^unit_l  80480 

^unit_181111 

Qunit_182295 

^  unit  _1 82633 

^unit_183191 

^unit_l  80481 

^unit_181142 

^unit_l  82341 

^unit_l  82634 

V^unit_183192 

^unit_l  80482 

V^unit_181204 

Qunit_182342 

t^unitj  82639 

^unit_183193 

^unit_l  80483 

|^unit_181237 

*^unit_l  82343 

V^unit_l  82640 

^unit_183195 

^unit_l  80484 

^unit_181346 

Qunit_l  82344 

^unit_l  82655 

^unit_183216 

V^unit_l  80485 

Viunit_181421 

V^unitJ  82345 

^unit_l  82668 

Qunit_183253 

^J|unit_l  80493 

unit_l  8 1 443 

V^unit_l  82346 

^unit_l  82670 

t^unit_183293 

^unit_l  80543 

^unit_181478 

Qunit_182401 

<^unit_l  82679 

^unit_l  83297 

Vj|unit_l  80628 

Qunit_181495 

^unit_182418 

^unit_182713 

Qunit_183298 

sjjunit_l  80645 

H^unit_181544 

^unit_l  82454 

y^unit_l  82767 

Vjj|unit_l 83299 

^unit_l  80647 

^unit_181695 

V^unit_l  82457 

Qunit_l  82768 

Viunit_183300 

^unit_l  80648 

^unit_181736 

\^unit_l  82465 

t^unit_l  82769 

^unit_183301 

Vj|unit_l  80649 

t^unit_181778 

V^unit_l  82466 

^  unit  _1 82770 

^unit_183302 

^unit_l  80651 

Qunit_181818 

Qunit_l  82467 

^unit_l  82771 

Qunit_183311 

^unit_l  80652 

Qunit_181819 

Qunit_182469 

y^unit_l  82772 

^unit_183312 

V^unitJ  80655 

\^unit_181820 

^unit_l  82476 

t^unit_l  82794 

^unit_183313 

*J|unit_l  80671 

unit_l  8 1 82 1 

^unit_l  82481 

Qunit_l  82823 

l^unit_183338 

^unit_l  80694 

^unit_181822 

Qunit_l  82483 

^unit_l  82838 

^unit_l  83355 

y^unit_l  80732 

^unit_181825 

\^unit_l  82484 

^unit_l  82868 

<^unit_183363 

yj|unit_l  80764 

Qunit_181843 

V^unitJ  82485 

^unit_182915 

^unit_l 83364 

^unit_l  80786 

^unit_181942 

y^unit_l  82487 

^unit_l  82938 

^unit_183366 

v^unit_180819 

lyjunit_181947 

Qunit_182488 

^unit_l  82939 

^unit_l  83368 

^unit.l  80857 

\^unit_181960 

Qunit_182510 

y^unit_l  82940 

^unit_183369 

^unit_l  80863 

^unit_181961 

Qunit_l  82549 

t^unit_l  82941 

^unit_183377 

^unit_l  80865 

^unit_181964 

H^unitJ  82562 

^unit_l  82942 

V^unit_l  83387 

yj|unit_l  80868 

^unit_181965 

^unit_182601 

^unit_l  82943 

V^unit_183389 

Qunit_180913 

Qunit_181968 

Qunit_182603 

t^unitj  82944 

Qunit_183391 
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^  Occurren 
ce 

*  ACTSER 
*  VICE 

£  WRank 

$  PopNum 

$  filter_$ 

$  Group 

$  gender 

$  indicator 

$  casen 

$  totalcases 

1.00 

7.00 

4.00 

15930.00 

.00 

2.00 

1.00 

1.00 

10972.00 

1392.00 

1.00 

7.00 

4.00 

.00 

.00 

2.00 

1.00 

1.00 

10973.00 

1392.00 

1.00 

7.00 

4.00 

47106.00 

.00 

2.00 

1.00 

1.00 

10974.00 

1392.00 

1.00 
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6.  Weighting  and  Outcome 


We  have  three  ways  to  compute  weights.  The  first  one  is  Option  1  at  section  2.3.2.  The  reference 
population  is  unit  population.  Follow  table  is  the.  outcome  of  this  method. 

Table  7:  Weights  from  Unit  Reference  Population 
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The  second  one  is  also  Option  1  at  section  2.3.2,  but  the  reference  population  is  the  whole 
population. 


Table  8:  Weights  from  Whole  Reference  Population 
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The  third  one  is  done  by  Component  3  (Decision  Tree  method  in  Section  3).  Following  is  outcome 


of  decision  tree. 


Figure  2:  Decision  Tree 
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Following  table  is  from  decision  tree.  The  column  “WeightfromDecisionTree”  is  the  weight. 

Table  9:  Weights  from  Decision  Tree 
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After  we  computed  these  three  types  of  weights,  we  use  algorithm  Component  5  to  attach  these 
weights  to  each  case. 


Table  10:  Three  Weights  in  Original  Dataset 
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In  the  following,  we  would  like  show  effects  of  weights  some  group  and  gender  variables: 

Figure  3:  Effects  of  Weight  on  Distribution  of  Variables 
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Following  we  used  T-test  to  compare  these  three  types  of  weights. 

Figure  4:  Comparing  Weights  by  t-Test 
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From  above  results  and  comparison  study,  we  can  make  conclusion: 


>  In  Decision  Tree,  “Gender”  and  “Gender”  are  used  to  predict  a  response.  More  variables  will 
lead  to  better  weights. 

>  Logistic  and  Decision  Tree  method  is  most  helpful  when  we  are  selecting  a  sample  form  an 
informative  sampling  frame. 

>  Weight  by  using  unit  reference  population  is  “similar”  on  average  to  weight  from  decision  tree. 

>  Weights  do  lead  to  changes  of  distribution. 

7.  References 

( 1 )  Barbara  Lepidus  Carlson,  Stephen  Williams,  “A  COMPARISON  OF  TWO  METHODS  TO 

/ 

ADJUST  WEIGHTS  FOR  NON-RESPONSE:  PROPENSITY  MODELING  AND  WEIGHTING 
CLASS  ADJUSTMENTS”,  Proceedings  of  the  Annual  Meeting  of  the  American  Statistical 
Association,  August  5-9,  2001 

(2)  Che-Chern  Lin,  Hung- Jen  Yang  and  Lung-Hsing  Kuo,  “Behaviour  analysis  of  internet  survey 
completion  using  decision  trees  An  exploratory  study”,  Online  Information  Review,  1  July  2008, 


pp.  117-130. 


Re-construction  and  Weighting 


36 


(3)  Chris  Skinner,  “What  is  Survey  Weighting?”, 
http://eprints.ncrm.ac.uk/1358/lAVeighting%20Festival%202010.pdf 

(4)  Cosma  Shalizi,  “Logistic  Regression”,  www.stat.cmu.edu/~cshalizi/uADA/12/lectures/chl2.pdf 

(5)  David  R.  Johnson,  “Using  Weights  in  the  Analysis  of  Survey  Data”, 
http://web.pop.psu.edu/projects/help_archive/help.pop.psu.edu/help-by-statistical- 
method/weighting/Introduction%20to%20survey%20weights%20pri%20version.ppt/at_download/ 
Introduction%20to%20survey%20weights%20pri%20version.ppt,  November  2008. 

(6)  Deng,  P.-S.  (1996),  “Using  case-based  reasoning  approach  to  the  support  of  ill-structured 
decisions”,  European  Journal  of  Operational  Research,  Vol.  93,  pp.  51 1-21. 

(7)  Eun  Sul  Lee,  Ronald  N.  Forthofer,  “Analyzing  Complex  Survey  Data”,  SAGE  Publications,  Inc, 

/ 

2nd  edition,  September  22,  2005. 

(8)  Floyd  J.  Fowler,  “Survey  Research  Methods”,  5th  Edition  SAGE  Publications,  Inc.,  5  edition, 
September  18,  2013. 

(9)  Graham  Kalton  and  Ismael  Flores-Cervantes,  “Weighting  Methods”,  Journal  of  Official  Statistics, 
Vol.  19,  No.  2,  2003,  pp.  81-97^ 

(10)  IBM,  “IBM  SPSS  Decision  Trees  20”. 

(11)  IBM,  “Python  Reference  Guide  for  IBM  SPSS  Statistics”. 

(12)  IBM,  “IBM  SPSS  Modeler  18.0  User's  Guide”. 

(13)  Ibrahim  S.  Yansaneh,  “Construction  and  use  of  sample  weights”,  Expert  Group  Meeting  to 
Review  the  Draft  Handbook  on  Designing  of  Household  Sample  Surveys,  3-5  December  2003. 

(14)  Indurkhya,  N.  and  Weiss,  S.M.  (1998),  “Estimating  performance  gains  for  voted  decision  trees”, 
Intelligent  Data  Analysis,  Vol.  2,  pp.  303-10. 

(15)  Jae  Kwang  Kim,  C.  J.  Skinner,  “Weighting  in  survey  analysis  under  informative  sampling”, 


Re-construction  and  Weighting 


37 


Volume  100,  Issue  2,  June  2013 

(16)  Leslie  Kish,  Survey  Sampling.  New  York:  John  Wiley  and  Sons,  1965,  1995. 

(17)  Mark  Lutz,  “Learning  Python”,  Fifth  Edition,  O’Reilly  Media,  Inc.,  June  2013 

(18)  Mendonca,  L.F.,  Vieira,  S.M.  and  Sousa,  J.M.C.  (2007),  “Decision  tree  search  methods  in  fuzzy 
modeling  and  classification”,  International  Journal  of  Approximate  Reasoning,  Vol.  44,  pp.  106- 
23. 

(19)  Mugambi,  E.M.,  Hunter,  A.,  Oatley,  G.  and  Kennedy,  L.  (2004),  “Polynomial-fuzzy  decision  tree 
structures  for  classifying  medical  data”,  Knowledge  Based  Systems,  Vol.  17,  pp.  81-7. 

(20)  Neil  Malhotra,  Annie  Franco,  Gabor  Simonovits,  L.J.  Zigerell,  “Developing  Standards  for  Post- 
Stratification  Weighting  in  Population-Based  Survey  Experiments”, 

7 

web.stanford.edu/~neilm/weights_mayl_final_identified.pdf 

(21)  PEAS,  "Adjusting  for  non-response  by  weighting", 
http://www.restore.ac.uk/PEAS/nonresponse.php. 

(22)  Richard  J.  Harris,  “Mini-Report:  Use  of  Weights  with  a  Sample  from  the  DEOCS”,  Version  3.3, 
3/17-3/24,  20111,  DEOMI,  Summer  2014. 

(23)  Robert  M.  Groves,  et  al.,  Survey  Methodology,  2nd  edition,  Hoboken,  NJ:  John  Wiley  and  Sons, 


(24)  Sarasin,  F.P.  (2001),  “Decision  analysis  and  its  application  in  clinical  medicine”,  European 
Journal  of  Obstetrics  &  Gynecology  and  Reproductive  Biology,  Vol.  94,  pp.  172-9. 

(25)  Swaroop  C  H,  “A  Byte  of  Python”,  ebshelf  Inc.,  September  29,  2013 

(26)  Tsujino,  K.  (1995),  “Implementation  and  refinement  of  decision  trees  using  neural  networks  for 
hybrid  knowledge  acquisition”,  Artificial  Intelligence  in  Engineering,  Vol.  9,  pp.  265-75. 

(27)  Wang,  J.-L.  and  Chan,  S.-H.  (2006),  “Stock  market  trading  rule  discovery  using  two-layer  bias 


Re-construction  and  Weighting 


38 


decision  tree”,  Expert  Systems  with  Applications,  Vol.  30,  pp.  605-11. 

(28)  Wikipedia,  “Python  (programming  language)”, 
https://en.wikipedia.org/wiki/Python_(programming__language)#cite_note-About-25 

(29)  Yan-yan  SONG,Ying  LU,  “Decision  tree  methods:  applications  for  classification  and  prediction”, 
Shanghai  Arch  Psychiatry,  2015;  27(2):  130-135. 


X 


Re-construction  and  Weighting 


39 


Appendix  A:  Algorithm  Component  1 

1  out  put  close  all. 

2  GET  F IL  E  =  '  E  :  \WE I  UAH\My  SPSS\Test Files \Alldat awit hlndicator.  sav  1 . 

3  DATASET  HAME  alldata. 

4  SORT  CASES  BY  AFEOCAID. 

5  compute  casen  =  SCASEtiUM. 

6  format  s  c  as  en( f 1 2 . 0} . 

7  VARIABLE  LEVEL  casen  (SCALE). 

B  EXECUTE. 

9 

10  BEGIN  PROGRAM. 

11  import  spss 

12  # - generate  a  list  of  variable  names - 

13  ordlist =[ ] 

14  ordlist 2=[] 

15  for  i  in  range (spss. Get VariableCount ( )): 

1 6  or  dll  i  st .  append(  spss.  Get  V  ari  abl  el  ame(  i  )  ) 

1 7  or dl i st  2 . appe nd( spss. Get  V  ari abl e  Ty  pe( i ) ) 

IB 

19  totalvar  =  s  pss.  Get  VariableCount  (  ) 

20  * - change  from  Unicode  to  string - 

21  length  =  len(  ordlist) 

22  newordlist  =[ ] 

23  for  i  in  range(  length): 

newordlist .  append( [ ordlist [ i ] .  encode(  p ascii  ' *  'ignore'  )* ordlist 2[i]]) 

25  #--end - 

26  # -generate  a  dictionary  {unit  number:  [unit ID*  begincasenum*  endcasenum]}- 

27  *  with  spss.  DataStep(  )  starts  a  block  to  manipulate  data - 

28  with  spss.  Dat aSt ep(  ): 

ds  =  spss.  Dat aset(  )  # — create  a  pointer  to  point  at  current  active  dataset  -- 

30  i  =1 

31  dl=dict() 

32  t-i:  unit  number--]  =  [dept id*  begi ni ngc as en  urn*  endingcasenum] - 

j=[ds.  cases[0,  1][0]*  ds.  cases[0*t  ot  alvar-1 ][ 0]* ds .  cases[  0*  t  ot alvar-1  ][ 0] ] 

34  dll={i:j> 

iteration  =  0 
for  r  in  ds. cases: 

37  if  (r[l]  !=dl[i][0]): 

3B  dl[i][2]=dl[i][l]+  iteration-1 

39  i=i+l 

newj=[r[l]*  d!l[i-l][2]+l*  dl[i-l][2]+l] 

-11  dl[i]=newj 

-12  iteration=0 

43  It  erat  ion=it  eration+1 

dl[  i][  2  ]=ds.  cases[  -1*  tot  alvar-1  ][0] 

-15  #f allowing  is  a  long  process*  for  each  unit  compute  each  cell  in  cross  tab* - 

-16  *and  compute  the  difference — 

-17  for  key  in  dl: 

-IB  with  spss.  Dat aStep(  ): 

49  ds  1  =  s ps s .  Dat  as et  (  name=  " al  1  dat  a"  ) 

50  *  Create  a  new  dataset  for  each  unit - 

51  newdsl  =  spss.  Dat aset {name=IHone) 

52  *  Add  variables  to  this  new  dataset - 

53  for  i  in  range( length): 

51  newdsl.  varlist .  ap pend( newordl i  st  [  i ] [ 0 ] * newordl ist[i][l]) 

55 
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56 

57 

58 

59 

60 
61 
62 

63 

6 4 

65 

66 
67 
66 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
101 
102 
103 
lffl 

105 

106 
107 


dept  id  =  dl[key][0] 
ds Hanes  =  {newdsl. name  :  dept  id} 
begincasenum=int ( dl[ key ][ 1 ] ) -1 
endc  as  enuu=  i  nt  ( dl  [  key  ]  [  2  ]  ) 


t  enplist  =[ 0]*18 

vll,vl2,v21,v22,v31,v32,v41,v42,v51,v52,v61,v62,v71,v72,v8i,v82,vl01,vl02=templist 
for  row  in  dsl. cases [beginc as enu*  :  endcasenum]: 

#  add  this  case  to  the  new  file  - 

newdsl . cases . append(  row) 

t  count  the  number  of  each  cell  in  Group  and  gender  crosstab - 

if  row[ ds 1 . v  arl i st [ ' Group 1 

vll  =  vll  +  1 

if  row[ dsl . varlist [  Group 1 
vl2  =  vl2  +  l 


if  row[ dsl . varlist [ 'Group ' 
v21  =  v21+l 

if  row[ ds 1 . v  arl i st [ ' Group ' 
v22  =  V22+1 

if  row[dsl. varlist [  Group' 
v31  =  V31+1 

if  row[ ds 1 . v  arl i st [ ' Group 1 
v32  =  V32+1 

if  row[ ds 1 . v  arl i st [ 1 Group 1 
v41  =  v41  +  l 

if  row[ ds 1 . v  arl i st [ ' Group 1 
v42  =  v42+l 

if  row[ ds 1 . v  arl i st [ ' Group 1 
v51  =  v51  +  l 

if  row[ dsl. varlist [  Group' 

v52  =  V52+1 

if  row[ ds 1 . v  arl i st [ ' Group ' 
v61  =  v61+l 

if  row[ ds 1 . v  arl i st [ r  Group ' 
v62  =  v62+l 

if  row[ ds 1 . v  arl i st [ ' Group ' 
v71  =  V71+1 

if  row[ ds 1 . v  arl i st [ ’ Group ' 
v72  =  v72  +  l 

if  row[ ds 1 . v  arl i st [ ' Group ' 
v81  =  v81+l 

if  row[ ds 1 . v  arl i st [ f  Group ' 
v82  =  V&2+1 

if  row[ ds 1 . v  arl i st [ * Group ' 
v&l  =  v&l+l 

if  row[ ds 1 . v  arl i st [ r Group ' 
v82  =  V&2+1 

if  row[ ds 1 . v  arl i st [ ' Group ' 
Vl  01  =  V 1 01  + 1 

if  row[ ds 1 . v  arl i st [ r  Group ' 
vl02  =  V102  +  1 

print  ("This  is  the  key:  ",  key) 


]. index]  ==  1  and  row[ dsl. varlist [ 'gender r ]. index]==l: 
]. index]  ==  1  and  row[ dsl .varlist [ 'gender ']. index]==2: 
]. index]  ==  2  and  row[ dsl. varlist [ 'gender ']. index]==l: 
]. index]  ==  2  and  row[dsl. varlist [ 'gender ']. index]==2: 
]. index]  ==  3  and  row[ dsl. varlist [ 'gender' ]. index]==l: 
]. index]  ==  3  and  row[dsl. varlist [ 'gender ']. index]==2: 
]. index]  ==  4  and  row[ dsl . varlist [ ' gender ' ] . index]==l : 
]. index]  ==  4  and  row[dsl. varlist [ 'gender' ]. index]==2: 
]. index]  ==  5  and  row[ dsl . varlist [ ' gender ' ] . index]==l : 
]. index]  ==  5  and  row[dsl. varlist [ ‘gender’ ]. index]==2: 
]. index]  ==  6  and  row[ dsl. varlist [ r gender r ]. index]==l: 
]. index]  ==  6  and  row[dsl. varlist [ ‘gender’ ]. index]==2: 
]. index]  ==  7  and  row[ dsl . varlist [ ‘ gender ' ] . index]==l : 
]. index]  ==  7  and  row[ dsl. varlist [ 'gender ]. index]==2: 
]. index]  ==  8  and  row[ dsl .varlist [ "gender 1 ]. index]==l: 
]. index]  ==  8  and  row[ dsl. varlist [ "gender ]. index]==2: 
]. index]  ==  9  and  row[ dsl . varlist [ ' gender 1 ] . index]==l : 
]. index]  ==  9  and  row[ dsl. varlist [ 'gender r ]. index]==2: 
]. index]  ==  10  and  row[ dsl. varlist [ 'gender' ]. index]==l 
]. index]  ==  10  and  row[ dsl. varlist [ ’gender" ]. index]==2 
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print ("The  number  cf  cases  in  each  cell  of  crosstable  in  this  unit  is:/n",  vll, vl2, v21, 
v22±v31,v32,v41±v42,v51,v52±v61,v62,v71±v72±vBl,vB2±vl01,vl02) 

total  =  vll+  vl2+v21+v22+v31+v32+v41+v42+v51+v52+v61+v62+v71+v72+vBl+vB2+vl01+vl02 

print ("The  summation  of  all  cells  in  above  crosst able: ",  total) 

*  compute  the  difference  between  reference  population  and  reponses - 

if  ((  isinst ance( dsl.  cases[begincasenum,  dsl.  varlist  [  "E  IE  3/1'  ].  index][0],  int  )  )  or 
(  isinst ance(dsl. cases[  begincasenuit, dsl. varlist [ 'E1E3M' ]. index][0],  float)  )): 
diffll=dsl.  cases[begincasenum,  dsl.  varlist  [  E1E3M' ].  index][0]  -  vll 
diffl2=dsl.  cases^egincasenum^dsl.  varlist  [  E1E3F  1  ].  index][0]  -  vl2 
diff21=dsl.  cases[begincasenum, dsl.  varlist  [  E4E6J1'  ].  index][0]  -  v21 
diff22=dsl. cases[ begincas enum^ dsl. varlist [  E4E6F ' ]. index][0]  -  v22 
diff31=dsl.  casesCbegincasenuEij  dsl.  varlist  [  E7E9M'  ].  index][0]  -  v31 
diff32=dsl. cases[begincasenuitt,  dsl. varlist [ "E7E9F 1 ]. index][0]  -  v32 
diff41=dsl.cases[begincasenum^ dsl .varlist [  "WOM1  ].  index ][0]  -  v41 
diff42=dsl. cases[ begincasenum, dsl. varlist [  'WDF  1  ].  index][0]  -  v42 
diff 51=dsl .  cases^egincasenum,  dsl.  varlist  [  'Q103M'  ].  index][0]  -  v51 
dif f  52 = ds  1 .  c  as  es  [  begi  nc  as  enum,  ds  1 .  v  arl  i  st  [  '  0 10  3F  '  ] .  i  ndex  ]  [  0]  -  v  52 
diff61=dsl. cases [be ginc as enumf dsl. varlist [ ' 04 Abo v eM' ]. index ][0]  -  v61 
dif f 62 = ds 1 . c  as  es [ begi nc  as  enum,, ds 1 . v  arl i st [  04  Abov  eF ' ] .  i ndex ] [  0]  -  v  62 
diff 71=dsl.  cases  [begincasenum,  dsl.  varlist  [  GS1GS8M  '].  index ][0]  -  v71 
dif  f  7  2 = ds  1 .  c  as  es  [  begi  nc  as  enum,  ds  1 .  v  arl  i  st  [  ’  G51GSBF  ] .  i  ndex  ]  [  0]  -  v7  2 
dif f 81 = ds 1 . c  as  es [ be  gi nc  as  enui# ds 1 . v  arl i st [ ' G59_SE  5M '  ] .  i ndex ] [0]  -  v  81 
dif  f  82 = ds  1 .  c  as  es  [  be  gi  nc  as  enum,  ds  1 .  v  arl  i  st  [  '  G59_SE  SF ' ] .  i  ndex  ]  |[  0]  -  v  82 
diffl01=dsl. cases [begincasenum, dsl. var list [ 'OtherM  ].  index ][0]  -  vl01 
diffl02=dsl.  cases [begincasenum, dsl. var list [  'OtherF  '].  index  ][0]  -  vl02 
diff list =[  diff 11',  ' diff 12 ' ,  ' diff 21 ' ,  'diff 22  ■,  'diff 31  ',  'diff 32 ■, 'diff41 ■,  ■ diff 42  \ 
■diff 51 ',  1  dif f  52  1 ,  1  dif f  61 1 ,  'diff 62  ", 'diff 71 ",  1 diff 72  \  diff 81  \  'diff 82 ■, 'diff 101 1 , 

s  ampl  ec  as  e=  ds  1 .  c  as  es  [  begi  nc  as  enum  ] 
i ndexof group  =  dsl. varl ist [ ' Group ' ] .i ndex 
i ndexof gender  =  ds 1 . v  arl i st [ ' gender ' ] . i ndex 

♦  generate  non-resonsese  rows - 

for  var  in  difflist : 

numl = v  ar . r epl ac  e( " diff" , " " ) 
nun2=int (numl ) 
gendernum  =  int(num2°*10) 
groupnum  =  int((num2-gendernum)/10) 

valueof var=  int(eval(var)) 
if  valueof var  <  0: 

print("The  value  of  'wrong'  difference  is:",  valueofvar) 
pri nt ( "t  here  exi st  error  at  unit : " ,  dl [ key ] [  0]  ) 

continue 

if  valueofvar  ==  0: 
continue 

for  i  in  range( v al ueofv ar ) : 
for  j  in  range(t ot alvar): 
if  j  ==  i ndexof  gender: 

samplecase[ j ]  =  gendernum 
elif  (j  >=49)  and  (j<=199): 

samplecase[ j ]  =  Hone 
elif  j  ==  i ndexof group: 

samplecase[ j ]  =  groupnum 
elif  (j  >=201)  and  (j<=305): 


162  samplecase[ j ]  =  Hone 

163  elif  j  ==  (tot alvar-2): 

164  samplecase[ j ]  =  0 

165  elif  j  ==  (tot alvar-1): 

166  samplecase[ j ]  =  Hone 

167  newdsl . cases . append{ samplecase) 

168 

169  strdept  =  str(int(deptid)) 

170  name=list (dsHames. keys( ))[0] 

171  spss. Submit (r . 

172  DATASET  ACTIVATE  £{name)s. 

173  SAVE  OUTF IL  E  =  '  E  :  \WEI  UAH\My  SPSS\unitf  ile3\unit_£(  st  rdept  )s .  sav ' . 

174  DATASET  CLOSE  *{name)s. 

175  .  £locals()) 

176  spss. Submit (r . 

177  DATASET  ACTIVATE  all  data. 

178  DATASET  CLOSE  ALL. 

179  .  SlocalsO) 

180  End  Program. 


'diff 102 ' 
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Appendix  B:  Algorithm  Component  2 


1  OUTPUT  CLOSE  ALL. 

2  begin  program. 

3  import  glob 

4  import  spss 

5 

6  #  join  all  the  sav  files  in  the  directory  specified  below. 

7  t  Remember  to  save  the  resulting  file  at  the  end 

6  #  The  Data  Editor  may  not  show  the  last  partial  block  of  files 

9  #  until  the  SAVE  or  another  procedure  is  run. 

10  cmd  =  [] 

11  i  =  0 

12  first  =  True 

13  for  f count,  f  in  enumerat e< glob.  glob( "E : /WE I  UAH/My  SPSS/unitfile3/* . sav")): 

14  #  specification  for  files  to  join 

15  if  i  >=  49: 

16  if  first: 

17  cmdroot  =  ["ADD  FILES] 

16  first  =  False 

19  else: 

20  cmdroot  =  ["ADD  FILES  /FILE=*"] 

21  c m diroot .  ext  end{  c rnd ) 
spss . Submit (cmdroot ) 

23  i  =  0 

24  cmd  =  [] 

25 

26  cmd.  append( . /FILE  =  "%s .  %  f) 

27  i  +=  1 
26 

29  #  leftovers  from  last  block 

30  if  cmd: 

31  cmdroot  =  [  'ADD  FILES  /FILE=*',T  ] 

cmdroot . ext  end( cmd) 

33  s ps s . Submit { c  rndroot ) 

34 

35  print  "Files  merged:  Sii,  leftover  count:  S£i"  %  (f count  +1,  i) 

36 

37  spss. Submit ( r . 

36  SAVE  0UTF IL E  = 1 E : /UE I  UAH/My  SPSS/unitf ile3/allcombined. sav 1 . 

39  . ) 

30  end  program. 

/ 
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Appendix  C:  Algorithm  Component  3 


OUTPUT  CLOSE  ALL. 

*  Decision  Tree. 

TREE  indicator  [n]  BY  Group  [n]  gender'  [n] 

/TREE  DISPLAY=TOPD OWN  NODES=STATISTICS  BRANCHSTATISTICS=YES  NODEDEFS=YES  SCALE=AUTO 
/DEPCATEGORIES  USEVALUES=[.00  1.00]  TARGET=[.00  1.00] 

/PRINT  MODELSUMMARY  CLASSIFICATION  RISK 

/GAIN  CATEGORYTABLE=YES  TYPE=[NODE]  SORT=DESCENDING  CUMULATTVE=N O 
/PLOT  GAIN  INDEX  RESPONSE  INCREMENTS 
/SAVE  NODE©  PREDVAL  PREDPROB 
/METHOD  TYPE=EXHAUSTTVECHAID 

/GROWTHLIMIT  MAXDEPTH=  AUT O  MINPARENTSIZE=100  MINCHILDSIZE=20 
/VALIDATION  TYPE=NONE  OUTPUT=B  OTHSAMPLES 

/CHAID  ALPHASPLIT=0.01  SPL1TMERGED=YES  CHISQUARE=PEARSON  CONVERGES.  001  MAXTTERATIONS=200 
AD  JU  ST=B  ONFERRONI 
/COSTS  EQUAL 

/MISSING  N  OMCNALMIS  SIN  G=MISSIN  G. 


LOGISTIC  REGRESSION  VARIABLES  indicator 
/METHOI)=ENTER  PredictedProb  ability_2 
/SAVE=PRED 

/CRJTERIA=PIN(.05)  POUT(.IO)  ITERATE(20)  CUT(.5). 
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Appendix  D:  Algorithm  Component  4-1 


1  output  close  all . 

2 

3  GET  F IL E  = 1 E : \WEI  UAH\My  SFSS\Test Files \Alldat  awithout Indicator. sav  1 . 

4  DATASET  NAME  alldata. 

5  SORT  CASES  BY  AFEOCAID. 

6  compute  casein  =  fCASEHUM. 

7  format  s  c  as  en{ f 1 2 . 0) . 

E  VARIABLE  LEVEL  casein  (SCALE). 

9  EXECUTE. 

10 

11  BE GIM  PROGRAM. 

12  import  spss 

13 

14  total  van  =  spss. Get Variat}leCount{  ) 

15  * -works : generat e  a  dictionary  {unit  numberm:  [unit ID,  begincasenum,  endcasenunt]} 

16  #make  sure  you  have  case  number- - 

17  with  spss.  Dat aSt ep(  ): 

IB  ds  =  spss.  Dataset  (nane="  all  dat  a") 

19  i  =1 

20  dl=dict<) 

21  #-i:  unit  number- -j  =  [dept id,  beginingcasenum,  endingcasenum] — make  sure  you  have  case  number- 

j=[ds.  cases[0, 1][0],  ds.  cases[O,totalvar-l][0],ds.  cases[0,t  otalvar-l][0]] 

23  dl={i:j> 

24  iteration  =  0 

for  r  in  ds. cases: 

26  if  (r[l][  =  dl[i][0]>: 

27  dl[i][2]=dl[i][l]+  iteration-1 

2B  i=i+l 

29  newj=[r[l],  dl[i-l][2]+l,  dl[i-l][2]+l] 

dl[i]=newj 

31  iteration=0 

it  erat i on= it  erat i on+ 1 

33  dl[i][2]=ds. cases[ -l,t ot alvar-l][0] 

34  * - 

35  weight dic=dict( ) 

36  for  key  in  dl: 

37 

3B  weight  =  [1.0]+1B 

wit  h  spss. Dat  aSt  ep( ) : 

40  dsl  =  spss.  Dataset  (name- "all  dat  a") 

41  # - 

42  begincasenu«=int (dl[ key ][ 1 ] ) -1 
emdc  as  enum= i nt  <  dl [ key ] [ 2 ] ) 
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44 

45 

46 
41 
46 

49 

50 

51 

52 

53 

54 

55 

56 
51 
56 

59 

60 
61 
62 

63 

64 

65 

66 
67 
66 

69 

70 

71 

72 

73 

74 

75 

76 

77 
76 
79 
60 
61 
62 

83 

84 
65 
86 
87 
86 
89 
9© 

91 

92 

93 

94 

95 

96 

97 


f 


-working  on  here - 

vll,vl2,v21,v22,v31,v32,v41,v42,v51,v52,v&I,v62,v71,v72,vBl,vB2,vlBl,vl©2  =  [0.  ©]*18 
■For  row  in  dsl .  cases[  begincasenutn  :  endcasenun] : 

i-F  row[dsl. varlist [ 'Group ']. index]  ==  1  and  row[dsl.varlist[ "gender ']. index ]= 
vll  =  vll+1 

i-F  row[dsl. varlist [ 'Group ']. index]  ==  1  and  row[dsl.varlist[ "gender1 ]. index ]= 
vl2  =  vl2+l 

if  row[dsl. varlist[ 'Group " ]. index]  ==  2  and  row[dsl. varlist [ "gender ']. index ]= 
v21  =  v21+l 

if  row[dsl. varlist [ 'Group ']. index]  ==  2  and  row[dsl. varlist [ "gender ']. index ]= 
v22  =  v22+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  3  and  row[dsl. varlist [ "gender ']. index ]= 
v31  =  v31+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  3  and  row[dsl. varlist [ "gender ']. index ]= 
v32  =  v32+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  4  and  row[dsl. varlist [ "gender1 ]. index ]= 
v41  =  v41+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  4  and  row[dsl. varlist [ "gender ']. index ]= 
v42  =  v42+l 

if  row[dsl. varlist [ 'Group r ]. index]  ==  5  and  row[dsl. varlist [ "gender1 ]. index ]= 
v51  =  v51+l 

if  row[dsl. varlist [ 'Group r ]. index]  ==  5  and  row[dsl. varlist [ "gender1 ]. index ]= 
v52  =  v52+l 

if  row[dsl. varlist [ 'Group ']. index]  ==  6  and  row[dsl. varlist [ "gender1 ]. index ]= 
v61  =  v61+l 

if  row[dsl. varlist [ 'Group ']. index]  ==  6  and  row[dsl. varlist [ "gender ']. index ]= 
v62  =  v62+l 

if  row[dsl. var list [ 'Group " ]. index]  ==  7  and  row[dsl. varlist [ "gender ']. index ]= 
v71  =  v71+l 

if  row[dsl. varlist [ 'Group "]. index]  ==  7  and  row[dsl. varlist [ "gender ']. index ]= 
v72  =  v72+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  8  and  row[dsl«varlist [ "gender" ]. index ]= 
vBl  =  v81+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  8  and  row[dsl.varlist [ "gender" ]. index ]= 
v82  =  v82+l 

if  row[ dsl. varlist [ 'Group ']. index]  ==  9  and  row[dsl«varlist [ "gender" ]. index ]= 
vBl  =  vBl+1 

if  row[dsl. varlist [ 'Group ']. index]  ==  9  and  row[dsl. varlist [ "gender" ]. index ]= 
v82  =  V82+1 

if  row[dsl. varlist [ ’Group" ]. index]  ==  10  and  row[dsl. varlist [ "gender" ]. index ]= 
V101  =  V101+1 

if  row[dsl. varlist [ 'Group' ]. index]  ==  1©  and  row[dsl. varlist [ "gender" ]. index ]= 
vl©2  =  V102+1 


=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 
=  1: 
=  2: 


total  =  float (vll+  vl2+v21+v22+v31+v32+v41+v42+v51+v52+v61+v62+v71+v72+vBl+vB2+vl©l+vl©2 ) 
respondernumi=[  vll,  vl2,  v21,  v22,  v31,  v32,  v41,  v42,  v51,  v52,  v61, 
v62,  v71,  v72,  v81,  vS2,  vl©l,  vl©2] 

res  ponders  rat  io=[  vll/t  ot  al,  vl2/total,  v21/total,  v22/total,  v31/total,  v32/total,  v41/total, 
v4 2 /total,  v51/t ot al,  v52/total,  v 61  total,  v62/total,  v71/total,  v72/total,  v81/total, 
v  82  /t  ot  al ,  v  1  ©1  /t  ot  al ,  v  1  ©2  /t  ot  al  ] 

referencepop=[0.  ©]+:18;.  referenceratio=[©.  0]  h18 

if  (  isinst  ance(  ds  1 .  cases[  begincasenuii,  dsl .  varlist  [  E1E3M'  ].  index][0],  (int,  float)) 

and  isinst ance(dsl.  cases[begincasenuii,  dsl.  varlist  ['  E1E3F  ’].  index][0],  (int,  float)) 


X 
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and  isinst  ance(dsl.  cases[  begincasenuim,  dsl .  varlist[  'E4E6fl  ].  index][©],  <  iot  *  float)) 
and  isinst  ance{ dsl.  casesEbegincasenuitj, dsl.  varlist [ ’E4E6F '  ].  index][©],  (int,  float)) 
and  isinst  ance(  dsl .  cases[  begincasenumij.  dsl .  varlist  [  ’ E 7E  9M '  ] .  index][  ©],  <int±  float)) 
and  isinst ance(dsl. cases[ begincasenum, dsl. varlist [ 'E7E9F  ]. index][©],  <int±  float)) 
and  isinst  am=e(  dsl.  cases[ begincasenum,  dsl.  varlist  [  'WOM  ].  index]];©],  (int,  float)) 
and  isinst ance{ dsl. cases[begincasenum, dsl. varlist [ 'WOF  ]. index][©],  {int,  float)) 
and  isinst  ance{ dsl.  cases[begincasenuii, dsl.  varlist [ 'OlOlflT  ].  index]];©],  (int,  float)) 
and  isinst  ance(  dsl.  cases[ begincasenum,  dsl.  varlist  [  '0103F  ].  index][©],  (int,  float)) 
and  isinst ance< dsl. cases[ begincasenum, dsl. varlist [ '{^AboveM' ]. index][©],  (int,  float)) 
and  isinst ance( dsl. cases[ begincasenum, dsl. varlist [ '04AboveF ']. index][©],  (int,  float)) 

168  and  isinst  am=e(  dsl.  cases[ beginc  as  enum,  dsl.  varlist  [  ’GS1GS8/I  ].  index]  [©■],  (int,  float)) 

169  and  isinst ance( dsl. c as es[ begincasenum, dsl. varlist [ ’GS1GS8F  ]. index][©],  (int,  float)) 

116  and  isinst ance( dsl. c as es[ begincasenum, dsl. varlist [ ,GS9_SESM‘ ]. index][6],  (int,  float)) 

111  and  isinst  ance(  dsl.  c as es[ begincasenum,  dsl.  varlist  [  'GS9_SESF  '].  index][©],  (int,  float)) 

112  and  isinst  ance(  dsl.  c as es[ begincasenum,  dsl.  varlist  [  'Other/!'  ].  index][©],  (int,  float)) 

and  isinst  ance(  dsl.  c as es [ begi nc as enum, dsl. varlist [  ’OtherF  '].  index][©],  (int,  float)) 

1M  ): 

115  referent  epop[  ©]  =  ds  1 .  c  as  es  [  be  gi  nc  as  enum,  ds  1 .  v  arl  ist[  1 E 1 E  3M '  ] .  i  ndex  ]  [  ©■] 

116  referencepop[l]  =  dsl. c as es[ begincasenum, dsl. varlist [ ‘E IE 3F ]. index][©] 

117  referent epop[ 2]  =  dsl. cases[ begincasenum, dsl. varlist [ ‘E4E 6/1  ’ ]. index] [6] 

118  ref  er  enc  epo-p[  3  ]  =  ds  1 .  c  as  es  [  b  egi  nc  as  enum, ds 1 . v  arl  ist[  1 E  4  E  6F  "  ] .  i  ndex  ]  [  ©■] 

119  referencepop[4]  =  dsl. cases[ begincasenum, dsl. varlist [ ‘E7E9M’ ]. index] [6] 
ref  erencepop[  5]  =  dsl.  cases  [beginc  as  enow,  dsl. varlist  [  1 E7E9F  "  ].index][6] 
ref erencepop[ 6]  =  dsl. cases [beginc as enum, dsl. varlist [ 'MOM1 ]. index ][©] 

122  referencepop[7]  =  dsl.  cases[ begincasenum,  dsl.  varlist  [  'WOF  '].  index]  [6] 

123  refer  enc  epop[G]  =  dsl.  cases[ begincasenum,  dsl.  varlist  [  ‘0103M’  ].  index]  [6] 

124  ref  erencepop[  9]  =  dsl.  cases[ begincasenum,  dsl.  varlist  [  '0103F  '].  index]  [6] 

125  ref erenc epop[  1 6]  =  dsl.cases[ begi n  c  as  enum, ds 1 . v arl i s t [  1 04  AboveH 1  ] .  i ndex ] [ 6] 

126  referencepop[ll]  =  dsl. cases[ begincasenum, dsl. varlist [ 04AboveF ']. index][0] 

127  referencepop[12]  =  dsl. cases [beginc as enum, dsl. varlist [ 'GSIGSBM1 ]. index ][©] 

128  ref  erencepop[  13]  =  dsl.  cases[ begincasenum,  dsl.  varlist  [  GS1GS8F  '].  index]  [6] 

129  referent  epop[  14  ]  =  ds  1 .  c  as  es  [  begi  nc  as  enyit  ds  1 .  v  arl  i  st  [  '  GS9_SE  S*1  ] .  i  ndex  ]  [  ©■] 

1 36  referent  epop[  15]  =  ds  1 .  c  as  es  [  begi  nc  as  eny  m  ,  ds  1 .  v  arl  i  st  [  '  GS9_SE  SF  '  ] .  i  ndex  ]  [  ©■] 

1 31  referent  epop[  16]  =  ds  1 .  c  as  es  [  begi  nc  as  eny  m  ,  ds  1 .  v  arl  i  st  [  '  Ot  herM  ] .  i  ndex  ]  [  ©■] 

1 32  referent  epop[  17]  =  ds  1 .  c  as  es  [  begi  nc  as  enum,  ds  1 .  v  arl  i  st  [  '  Ot  herF  ] .  i  ndex  ]  [  ©■] 

133  # - 

134  ref  sum=6.  © 

135  for  element  in  ref erencepop: 

136  refs  urn  =  ref sum+ element 

137  * - 

138  e=© 

139  for  ele  in  referencepop: 

146  if  ref  son  !=6.  ©: 

141  referenceratiofe]  =  ele/ref sum 

142  else: 

143  referenceratio[e]  =  ©.  © 

144  e=e+l 

145  # - 

146  for  d  in  range(len( referencepop)): 

147  if  respondersratio[d]  !=6.  6  and  ref erenc eratio[d]  !=  ©.  ©: 

148  weight [d]  =  referent erat io[d]/respondersrat io[d] 

149  else: 

150  weight  [d]  =  1.6 

151  list b=[dl[ key ][©]]  +  weigh* 

152  weight dic[ key ]=list fa- 


153 

154  print  (  weight  die  ) 

155  f  Create  a  new  dataset  for  each  value  of  the  variable  'unit' 

1 56  wit  h  s  ps  s .  Dat  aSt  ep{  ) : 

1 57  newds  1  =  s  ps  s  .  Oat  as  et  {  n am e  =  H  o-ne  ) 

158  name= newds 1 . name 

159  newdsl . varlist . append^ ' unit  casenunber ' , 6) 
newdsl . varlist . append^ ' unit  id 1 , 0) 

161  for  v  in  range(lS): 

162  t  emplist  =  [ 'weight  r  , v ] 

163  t emplist [1]  =  st r(t emplist [ 1 ] ) 

164  var name  =  j oin(t emplist ) 

newdsl . varlist . append^  varnaiet 0) 

166 

167  for  i,  j  in  weight  die .  it  ens<) : 

168  t emplist =  [  ] 
temp-list  =  [i]  +  j 

newdsl .cases . append{t  emplist ) 

171 

172  spss . Submit ( r . 

173  DATASET  ACTIVATE  V(name)s. 

1 74  SAVE  O UTF  IL  E  =  1  E  :  \UE  I  UAH \My  SPSS\ Test  Files \ wei ght  s_3K(  name )s.sav'  „ 

175  DATASET  CLOSE  ALL. 

176  .  KlocalsO) 

177  End  Program. 
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Appendix  E:  Algorithm  Component  4-2 


1  output  close  all. 

2 

3  GET  F IL E  = ' E : \WE I  WAH\My  SPS5\Test Files\Alldat awithout Indicator. saw  1 . 

4  DATASET  NAME  alldata. 

5  SORT  CASES  BY  AFEGCAIB. 

6  compute  casen  =  SCASEHW1. 

7  formats  casen(fl2. 0). 

&  VARIABLE  LEVEL  casen  (SCALE). 

9  EXECUTE. 

10 

11  BE  611  PROGRAM. 

12  import  spss 

13 

13  totalvar  =  spss. Get VariableCount (  ) 

15  #-wor  ks:  generate  a  dictionary  {unit  number®:  [  unit  ID,  begincasenum,  endcasenum]} - 

16  intake  sure  you  have  case  number- - 

17  with  spss.  DataSt ep(  ): 

IB  ds  =  spss.  Dataset  (name=r  all  data") 

19  i  =1 

2©  dl=dict<) 

21  #-i:  unit  number- -j  =  [dept  id,  begin ingcas enurtij  endi ngc as enunt ] make  sure  you  have  case  number - 

j=[ds.  cases[8,l][0],  ds.  cases[0,tot  alvar-ll^^ds.  cases[0,t  otalvar-l][0]] 

23  dl={i: j> 

24  iteration  =  0 

for  r  in  ds. cases: 

26  if  (r[l]!=dl[i][0]>: 

27  dl[i][2]=dl[i][l]+  iteratio>n-l 

2B  i=i+l 

29  new  j  =  [  r[  1  ]  ,  dl[i-l][2]+l,  dl[i-l][2]+l] 

dl[i]=newj 

31  iteration=0 

it  erat i on= it  erat i on+ 1 

33  dl[i][2]=ds. cases [ -1, tot  alvar-l][0] 

34  I - 

35  wei ght  di c = di ct ( ) 

36  for  key  in  dl: 

37 


38  weight  =  [1.0] *18 

wit  h  spss. Dat  aSt  ep{ ) : 

ds  1  =  s  ps  s .  Dat  as  et  (  name=  "  al  1  dat  a"  ) 

41  t - 

42  begi nc  as  enum= i nt ( dl [ key ] [ 1 ] ) - 1 
endcasenum=int ( dl[ key ][ 2 ] ) 

33  # - working  on  here - 

45  vll,vl2,v21,v22,v31,v32,v31,v32,v51,v52,vGl,v62,v71,v72,vBl,vB2,vl01,vl©2  =  [©. 0]*1S 

36  for  row  in  dsl. cases[begincasenutti  :  endcasenum]: 

37  if  row[dsl.varlist[ 'Group' ]. index]  ==  1  and  row[ dsl. varlist[ 'gender ']. index]==l: 

38  vll  =  vll  +  1 

if  row[dsl.varlist[ 'Group' ]. index]  ==  1  and  row[ dsl. ,varlist[ 'gender' ]. index]==2: 
5©  vl2  =  vl2+l 

51  if  row[dsl.varlist[ 'Group' ]. index]  ==  2  and  row[ dsl. varlist[ 'gender ']. index]==l: 

52  v21  =  v21+l 

if  row[dsl.varlist[ 'Group' ]. index]  ==  2  and  row[dsl.varlist[ 'gender' ]. index]==2: 

53  v22  =  v22+l 
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55 

56 

57 
56 

59 

60 
61 
62 

63 

64 

65 

66 
67 
66 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 
101 
102 

103 

104 

105 

106 

107 

108 


if  row[dsl. varlisf [ 'Group' ]. index]  ==  3  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v31  =  v31+l 

if  row[ dsl. varlisf [ 'Group' ]. index]  ==  3  and  row[ dsl. varl ist[ ’gender' ]. index]==2: 
v32  =  v32+l 

if  row[ dsl. varlisf [ 'Group' ]. index]  ==  4  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v41  =  V41+1 

if  row[dsl. varlisf [  'Group  ].  index]  ==  4  and  row[ ds  1 . varl i  st  [  ' gender '  ] .  i index ] = = 2 : 
v42  =  v42+l 

if  row[dsl. varlisf [ 'Group' ]. index]  ==  5  and  row[ ds  1 . varl i st [  ' gender '  ] .  i index ] = = 1 : 
v51  =  v51+l 

if  row[dsl. varlisf [ 'Group' ]. index]  ==  5  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
v52  =  v52+l 

if  row[dsl. varlisf [ 'Group' ]. index]  ==  6  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v61  =  V61+1 

if  row[ dsl. varlisf [ 'Group' ]. index]  ==  6  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
v62  =  v62+l 

if  row[dsl. varlisf [ 'Group' ]. index]  ==  7  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v71  =  v71+l 

if  row[dsl. varlisf [ 'Group r ]. index]  ==  7  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
v72  =  v72+l 

if  row[dsl. varlisf [ 'Group' ]. index]  ==  8  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v81  =  v81+l 

if  row[ dsl. varlisf [ 'Group' ]. index]  ==  8  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
v  82  =  v82+l 

if  row[ dsl. varlisf [ 'Group' ]. index]  ==  9  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 
v81  =  v81+l 

if  row[ dsl. varlisf [ 'Group  ]. index]  ==  9  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
v82  =  v82+l 

if  row[ dsl. varlisf [ "Group1 ]. index]  ==  10  and  row[ dsl. varlisf [ 'gender' ]. index]==l: 

Vl01  =  V101+1 

if  row[ dsl. varlisf [  Group' ]. index]  ==  10  and  row[ dsl. varlisf [ 'gender' ]. index]==2: 
vl02  =  V102+1 

print ("This  is  the  key:  ",  key) 

total  =  float (vll+  vl2+v21+v22+v31+v32+v41+v42+v51+v52+v61+v62+v71+v72+v81+vB2+vl01+vl02  } 
print ("The  suB»ation  of  all  cells  in  above  crosstable:",  total) 

res pondernu»= [  v  1 1 ,  vl2,  v21,  v22,  v31,  v32,  v41,  v42,  v51,  v52,  v61,  v62,  v71,  v72,  v81,  v82,  vl01,  vl02] 
respondersrat io=[vll/t of al,  vl2/total,  v21/total,  v22/total,  v31/total,  v32/total,  v41/total, 
v42/t of al,  v51/t of al,  v52/total,  v61/total,  v62/total,  v71/total,  v72/total,  v 81 /total, 

|v 82/total,  vl01  total,  vl02/total] 

print ("This  is  respondersrat io: ",  res ponders rat io) 

ref  erenc  epop= [ 0.  0] *18; ref  erenceraf io=[ 0.  0]  +  18 

rrafioll  =  0.065167;  rrafiol2  =  0.015172;  rrafio21  =  0.164276;  rrafio22  =  0.034  353; 

rrat io31  =  0.046183;  rrafio32  =  0.007292;  rrafic41  =  0.005294;  rrafio42  =  0.0014788; 

rrat  io51  =  0.02747;  rrafio52  =  0.007538; 

rratio61  =  0.026882;  rratio62  =  0.005018;  rrafio71  =  0.0255074;  rratio72  =  0.02548; 

rrat  io81  =  0.35537;  rrafio82  =  0.149931;  rratiol01  =  0.029049;  rratiol02  =  0. 0O8539 

ref erenceraf io=[ rrat i oil,  rratiol2,  rratio21,  rratio22,  rratio31,  rratio32,  rrafio41,  rrafio42,  rratio51, 
rratio52,  rratio61,  rratio62,  rratio71,  rratio72,  rrafioBl,  rratio82,  rratio!01,  rratiol02] 


for  d  in  range(len( ref  erenc epop)): 

if  respondersrat io[d]  !=0.  0  and  ref erenceraf io[d]  !  = 
weight [d]  =  ref erenceraf io[d ]  res ponders rat io[d] 

else: 
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109 

110 
111 
112 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 


weight  [d]  =  1.0 
list  b=[ dl[ key ][ 0] ]  +  weight 
wei ght di c [ key ]  =  1 i st  b 

pri nt { wei ght  die) 

*  Create  a  new  dataset  -For  each  value  of  the  variable  'unit' 

with  spss.  Dat aSt ep{  ): 

newds 1  =  spss. D  at  as  et { name=  H  one ) 
nane=newdsl.  name 

newds 1 . varlist . append ( ’ unit  casenumber 1 , 0) 
newds 1 . v arl i st . append ( ' unit i d 1 , 0) 
for  v  in  range{18): 

t  era.pl ist  =  [ 'Pweight  \v] 

t emplist [1]  =  st r(teraplist [1]) 
varnarae  =  1  _ 1 .  j  oi  n{  t  erapl  ist) 
newds 1 . varlist . append( varname, 0) 

for  i*  j  in  weight  die. it ems( ): 
t  erapl  ist=  [  ] 
t eraplist  =  [i]+j 
newds 1. cases. append(t eraplist ) 


spss. Submit (r . 

DATASET  ACTIVATE  X ( narae ) s . 

SAVE  GUTF IL E  = ' E : \WE I  WAH\My  SPSS\ Test Files\My weight s 07 032017 01_X( name) s. sav 1 . 
DATASET  CLOSE  ALL. 

.  XlocalsO) 

End  Program. 

/ 


/ 
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Appendix  F:  Algorithm  Component  5-1 

1  output  close  all. 

2 

3  GET  F IL  E  = ‘ E : \WEI  UAH\My  SPSS\Test Files\originaldatawit hweight sr. sav ’ . 

4  DATASET  HAME  weight eddat a. 

5  EXECUTE. 

e 

7  GET  F IL  E  = ' E : \UE I  WAH\My  SPSS\Test F i 1 es \wei ght s_xDat aset l_unit pop. sav 1 . 

6  DATASET  HAME  all  data. 

9  EXECUTE. 

10 

11  BEGIN  PROGRAM. 

12  import  spss 

13 

1- 1  totalvar  =  spss.  Get  VariableCount  () 

1 5  wit  h  s  ps  s . Dat  aSt  ep( ) : 

16  ds  =  spss.  Dataset  (name="  all  dat  a") 

17  ds  1  =  s  ps  s .  Dat  as  et  {  naite=  "wei  ght  eddat  a"  ) 

IB  ds 1 . v arl i st . append( ' 1 oc alwei ght ' ,  D) 

19 

20  dl=dict { ) 

21  for  r  in  ds. cases: 

22  dl[iot<r[l]}]  =  r[ 2 : t ot alvar] 

23 

2- 1  tot alcase=len( dsl.  cases) 

for  in  in  range(tot alcase): 

kkey  =  dsl.  cases[i, dsl.  varlist  [  AFEOCAID’  ].  index]!  0] 

if  dsl.  cases[  m,  dsl.  varlist!  'Group'  ].  index]! 0]  ==  1  and  dsl.  cases  [m,  dsl.  varlist!  'gender'  ].  index]!©]  ==  1: 

dsl.  cases[nt dsl. varlist!  'localweight  ].  index]  =  dl!kkey ][0] 
elif  dsl. cases! b, dsl. varlist !  Group1 ]. index]! 0]  ==  1  and  dsl.  cases! m,  dsl.  varlist!  gender  ’].  index]!©]  ==  2 

30  dsl.  cases[ b, dsl. varlist[  ’localweight  ’].  index]  =  dl[kkey][l] 

elif  dsl. cases! m, dsl. varlist !  Group1 ]. index]! ©]  ==  2  and  dsl.  cases  [m,  dsl.  varlist!  '  gender  ’].  index]!©]  ==  1 
dsl.  cases[ii, dsl. varlist[  ’localweight  ].  index]  =  dl[kkey][2] 
elif  dsl. cases! b, dsl. varlist ! 'Group' ]. index]!©]  ==  2  and  dsl. cases [m, dsl.  varlist ! 'gender' ].  index]! 0]  ==  2 

31  dsl. cases! b, dsl. varlist! 'localweight  ]. index]  =  dl[kkey][3] 

elif  dsl. cases! b, dsl. varlist ! ’Group' ]. index]! 0]  ==  3  and  dsl.  cases [m,  dsl.  varlist ! ’gender’ ].  index]! 0]  ==  1 
dsl.  cases[ it, dsl. varlist[ 'localweight '].  index]  =  dl[kkey][4] 
elif  dsl. cases! b, dsl. varlist ! 'Group' ]. index]!©]  ==  3  and  dsl. cases [m, dsl. varlist! 'gender' ]. index][0]  ==  2 
dsl. cases! b, dsl. varlist! ’localweight  ]. index]  =  dl[kkey][5] 
elif  dsl. cases! b, dsl. varlist ! 'Group' ]. index]!©]  ==  4  and  dsl.  cases [m, dsl.  varlist! ‘gender’ ].  index]! 0]  ==  1 
4 0  dsl.  cases[ it, dsl. varlist[ 'localweight '].  index]  =  dl[kkey][6] 

elif  dsl. cases! b, dsl. varlist ! 'Group' ]. index]!©]  ==  4  and  dsl.  cases  [it,  dsl.  varlist!  'gender'  ].  index][0]  ==  2 
42  dsl. cases! b, dsl. varlist! ’localweight  ]. index]  =  dl[kkey][7] 

elif  dsl . cases[ it, dsl . varlist [  Group " ] . index][ 0]  ==  5  and  dsl. cases! b,  dsl. varlist! 'gender' ]. index]! 0]  ==  1 
44  dsl.  cases[ it, dsl. varlist[ 'localweight '].  index]  =  dl[kkey][B] 

elif  dsl. cases! m, dsl. varlist ! 'Group' ]. index]!  0]  ==  5-  and  dsl.  cases [m,  dsl.  varlist ! ’gender' ].  index]!©]  ==  2 
-16  dsl.  cases[ it, dsl. varlist[ ’localweight '].  index]  =  dl[kkey][9] 

elif  dsl. cases[i, dsl. varlist!  Group' ]. index][0]  ==  6  and  dsl. cases[m, dsl. varlist! 'gender' ]. index][0]  ==  1 
■IB  dsl. cases! b, dsl. varlist! 'localweight  ]. index]  =  dl[kkey][10] 

elif  dsl. cases! b, dsl. varlist ! ’Group' ]. index]!©]  ==  6  and  dsl.  cases! si,  dsl.  varlist! 'gender' ].  index]!©]  ==  2 
dsl. cases[m, dsl. varlist[ 'localweight  ]. index]  =  dl[kkey][ll] 
elif  dsl. cases!®, dsl. varlist ! ’Group' ]. index]!©]  ==  7  and  dsl.  cases!®,  dsl. varlist!  ’gender'  ].  index][0]  ==  1 
dsl. cases!®, dsl. varlist! 'localweight  ]. index]  =  dl!kkey]!12] 
elif  dsl. cases!®, dsl. varlist ! 'Group' ]. index]!©]  ==  7  and  dsl.  cases  [®, dsl.  varlist ! 'gender' ].  index]!©]  ==  2 
54  dsl. cases! a, dsl. varlist! 'localweight ']. index]  =  dl!kkey]!13] 


elif  dsl.  cases! mi,  dsl.  varlist!  Group' ].  index]!©]  ==  B  and  dsl . cases! m, dsl . varlist !  ' gender  ].  index]! ©]  ==  1: 

dsl. cases! b, dsl. varlist! ’localweight ’]. index]  =  dl!  kkey]!  14] 
elif  dsl.  cases! bi,  dsl.  varlist!  Group' ].  index]! 0]  ==  B  and  dsl .  cases! m, dsl .  varlist !  ' gender '].  index]!  ©]  ==  2: 

dsl. cases! b, dsl. varlist! 'localweight  ']. index]  =  dll kkey]! 15] 
elif  dsl. cases! b, dsl. varlist!  Group' ]. index]! 0]  ==  9  and  dsl. cases! m, dsl. varlist!  gender  ]. index]! 0]  ==  1: 
dsl. cases! b, dsl. varlist! 'localweight  ]. index]  =  1 

elif  dsl. cases! b, dsl. varlist! 'Group' ]. index]!©]  =  =  9  and  dsl . cases! b, dsl . varlist !' gender ’]. index]! 0]  =  =  2: 
62  dsl.  cases! bi, dsl. varlist! 'localweight ’].  index]  =  I 

elif  dsl. cases! b, dsl. varlist!  Group' ]. index]!©]  ==  10  and  dsl . cases! m, dsl .varlist !’ gender 1 ]. index]! ©]  ==  1: 
64  dsl. cases! b, dsl. varlist! ’localweight ']. index]  =  dll kkey]! 16] 

elif  dsl. cases! b, dsl. varlist! 'Group' ]. index]!©]  ==  10  and  dsl. cases! b, dsl. varlist ! ’gender’ ]. index]!©]  ==  2: 

66  dsl. cases! b, dsl. varlist! ’localweight ’]. index]  =  dl!kkey]!17] 

67  else: 

6B  dsl. cases! b, dsl. varlist! "localweight  ]. index]  =1 

69 

7©  spss. Submit ( r . 

71  DATASET  ACTIVATE  weight eddat a. 

72  SAVE  0UTFIIE= ’E:\UEI  UAH \My  SPSS\Test F iles\ original dat awit hi oc alwei ght s . sav ’ . 

73  DATASET  CLOSE  ALL. 

7-i  . > 

75  End  Program. 
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Appendix  G:  Algorithm  Component  5-2 


1 

2 

3 

4 

5 

6 

7 

8 
9 

1© 

11 

12 

13 

14 

15 

16 

17 

18 
19 
2© 
21 
22 

23 

24 

25 

26 

27 

28 
29 
3© 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 
46 
49 
5© 

51 

52 

53 

54 


output  close  all. 

I 

GET  F IL E  =  ' E  :  \WE I  WAH\My  SPSS\Test  F iles\ original dat  awit  hlocalweight  s . sav 1 . 

DATASET  HAME  weight eddat a. 

EXECUTE . 

GET  F IL E  = 1 E : \WE I  WAH\My  SPSS\Test F iles\My weight s ©7 ©32©17©l_xDat aset l_wholepop. sav 1 . 

DATASET  HAME  all dat a. 

EXECUTE . 


BEGIN  PROGRAM, 
import  spss 

totalvar  =  spss .  Get  Variable! ouot  (  ) 
wit  h  spss. Dat  aSt  ep( ) : 

ds  =  spss.  Dat  as  et  (  nanne= "  al  1  dat  a"  ) 
ds  1  =  s  ps  s .  Dat  as  et  (  name=  "wei  ght  eddat  a"  ) 
dsl. varlist . append( ' globalweight  r>©) 

dl=dict O 

for  r  in  ds. cases: 

dl[int(r[l]>]  =  r[2: t ot alvar] 

t ot alcase=len(dsl. cases) 
for  m  in  range<t ot alcase) : 

kkey  =  dsl . cases[ m, dsl . varlist [ ' AF  E0CAID ' ] . index][  ©] 
group1.,1  al ue  =  dsl.  cases[*^ dsl.  varlist  [  'Group'  ].  index][©] 
gender v  al  ue  =  ds  1 .  c  as  es  [  m,  ds  1 .  v  arl  i  st  [  1  gender 1  ] .  i  ndex  ]  [  ©] 
if  groupvalue  ==  1  and  gendervalue  ==  1: 

dsl.  cases[Mj.dsl.  varlist [  ‘globalweight  ‘  ].  index] 
elif  groupvalue  ==  1  and  gendervalue  =  =  2: 

dsl.  casesOj  dsl.  varlist [  ‘globalweight  '  ].  index] 
elif  groupvalue  ==  2  and  gendervalue  ==  1: 

dsl. cases[M, dsl. varlist [ ‘  globalweight ‘ ]. index] 
elif  groupvalue  ==  2  and  gendervalue  =  =  2: 

dsl . cases[ m, dsl . varlist [ "  globalweight ‘ ] . index] 
elif  groupvalue  ==  3  and  gendervalue  ==  1: 

dsl . cases[ it dsl . varlist [ ‘ globalweight ’ ] . index] 
elif  groupvalue  ==  3  and  gendervalue  ==  2: 

dsl. cases[Mt dsl. varlist [ ‘globalweight ‘ ]. index] 
elif  groupvalue  ==  4  and  gendervalue  =  =  1: 

dsl . cases[ m, dsl . varlist [ ‘  globalweight ‘ ] . index] 
elif  groupvalue  ==  4  and  gendervalue  ==  2: 

dsl.  cases[iij. dsl.  varlist [  ‘globalweight  ‘  ].  index] 
elif  groupvalue  ==  5  and  gendervalue  =  =  1: 

dsl.  cases[m, dsl.  varlist  [  globalweight  ‘  ].  index] 
elif  groupvalue  ==  5  and  gendervalue  ==  2: 

dsl.  cases[ dsl.  varlist  [  ‘ globalweight  ‘  ].  index] 
elif  groupvalue  ==  6  and  gendervalue  =  =  1: 

dsl.  cases[mj.dsl.  varlist  [  "globalweight  ‘  ].  index] 
elif  groupvalue  ==  6  and  gendervalue  ==  2: 

dsl.  cases[imJ,dsl.  varlist  [  ‘ globalweight  ‘  ].  index] 
elif  groupvalue  ==  7  and  gendervalue  =  =  1: 

dsl.  casesOj  dsl.  varlist [  "globalweight  ‘  ].  index] 


dl[kkey ][©] 
dl[ kkey ][ 1 ] 
dl[kkey ][2] 
dl[ kkey ][ 3] 
dl[ kkey ] [4 ] 
dl[kkey ][5] 
dl[kkey ][6] 
dl[kkey ][7] 
dl[kkey ][8] 
dl[kkey ][9] 
dl[kkey ][1©] 
dl[kkey ][11] 
dl[kkey ][12] 


55 

56 

57 
56 
59 
66 
61 
62 

63 

64 

65 

66 
67 
6E 

69 

70 

71 

72 

73 

74 

75 

76 

77 


elif 

elif 

elif 

elif 

elif 

elif 

elif 

else: 


groupvalue  ==  7  and  gendervalue  ==  2: 

dsl . cases[ mf dsl . varlist [ ‘ globalweight " ] . index] 
groupvalue  ==  E  and  gendervalue  ==  1: 

dsl . cases[ m, dsl . varlist [  ‘  globalweight  " ] . index] 
groupvalue  ==  E  and  gendervalue  ==  2: 

ds  1 .  c  as  es  [  *  ,  ds  1 .  v  arl  ist  [  ‘  gl  obalwei  ght  ]  .  i  ndex  ] 
groupvalue  ==  9  and  gendervalue  ==  1: 

dsl . cases[ n^dsl . varlist [  ‘  globalweight  ‘ ] . index] 
groupvalue  ==  9  and  gendervalue  ==  2: 

ds  1 .  c  as  es  [  it,  ds  1 .  v  arl  ist  |[  ‘  gl  o-bal  wei  ght  "  ]  .  i  ndex  ] 
groupvalue  ==  10  and  gendervalue  ==  1: 

ds  1 .  c  as  es  [  mi,  ds  1 .  v  arl  i  st  [  ‘  gl  o-bal  wei  ght  '  ]  .  i  ndex  ] 
groupvalue  ==  10  and  gendervalue  ==  2: 

ds  1 .  c  as  es  [  m,  ds  1 .  v  arl  i  st  [  ‘  gl  o-bal  wei  ght  "  ]  .  i  ndex  ] 


dl[ kkey ][ 13] 
dl[ kkey ][ 14 ] 
dl[ kkey ][ 15] 
1 
1 

dl[ kkey ][ 16] 
dl[ kkey ][ 17 ] 


ds  1 .  c  as  es  [  m ,  ds  1 .  v  arl  ist  [  ‘  gl  o-bal  wei  ght  "  ]  .  i  ndex  ]  =1 


spss . Submit { r . 

DATASET  ACTIVATE  weight  eddat  a. 

SAVE  0UTFILE= ‘E : \WE I  WAH\My  SPSS\Test  F iles\ original dat  awit  hi ocalandgl obalwei ght  s . sav ‘ . 
DATASET  CLOSE  ALL. 


End  Program. 


> 
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Appendix  H:  Algorithm  Component  5-3 

1  output  close  all. 

2 

3  GET  FILE=,E:\WEI  WAH\My  SPSS\TestFiles\originaldatawit hi oc aland global weight s.sav‘ . 

4  DATASET  NAME  weight eddat a. 

5  EXECUTE. 

6 

7  GET  F IL E  =  ' E  :  \UEI  WAH\My  SPSS\unitfile3\HodeIDbyProb_07032017. sav ‘ . 

8  DATASET  NAME  alldata. 

9  EXECUTE. 

10 

11  BEGIN  PROGRAM. 

12  import  spss 

13 

14  total  war  =  spss.  Get  VariableCtJunt  (  ) 

15 

16  with  spss.  DataStep(  ): 

17  ds  =  spss.  Dataset  (name=“  all  data") 

IB  ds  1 = s ps s .  Dat  as et  (  name=  wei ght  eddat  a"  ) 

19  ds 1 . v  arl i st . append( ' wei ght  f romdec i s i ont  ree ' , 0) 

20 

21  vec=[  ] 

for  r  in  ds. cases: 

v  ec .  append(  r[  t  ot  al  v  ar  - 1 :  t  ot  al  v  ar  ]  [  0]  ) 

24 

25 

t  ot  alcase=len{ dsl . cases ) 
for  m  in  range(tot alcase): 

groupvalue  =  dsl.  casesfiij  dsl.  varlist [  'Group'  ].  index][0] 
gendervalue  =  dsl. cases[ij dsl. varlist[ 'gender' ]. index][0] 

30  if  groupvalue  ==  1  and  gendervalue  ==  1: 

dsl.  casesfitij  dsl.  varlist[  1  weight f romdecisiont  ree‘  ].  index]  =  vec[6] 
el if  groupvalue  ==  1  and  gendervalue  ==  2: 

dsl. cases[m, dsl. varlist[ ‘weightf romdecisiont ree 1 ]. index]  =  vec[16] 
34  el if  groupvalue  ==  2  and  gendervalue  ==  1: 

dsl. cases[fflj  dsl. varlist[ ‘weightf romdecisiont ree ‘ ]. index]  =  vec[l] 
el if  groupvalue  ==  2  and  gendervalue  ==  2: 

dsl.  casesfitij  dsl.  varlist[  ‘ weight f  romdecisiont  ree  ‘  ].  index]  =  vec[ll] 
3B  el if  groupvalue  ==  3  and  gendervalue  ==  1: 

dsl.  casesfitt,  dsl.  varlist[  ‘weightf  romdecisiont  ree  ‘  ].  index]  =  vec[0] 
40  el if  groupvalue  ==  3  and  gendervalue  ==  2: 

dsl.  caseslitij  dsl.  varlist[  ‘weightf  romdecisiont  ree  ‘  ].  index]  =  vec[10] 
42  el if  groupvalue  ==  4  and  gendervalue  ==  1: 

dsl. casestKj dsl. varlist[ ‘weightf romdecisiont ree  ]. index]  =  vec[7] 
44  el if  groupvalue  ==  4  and  gendervalue  ==  2: 

dsl.  casesCii,  dsl.  varlist[  ‘weightf  romdecisiont  ree  ].  index]  =  vec[17] 
46  el if  groupvalue  ==  5  and  gendervalue  ==  1: 

dsl.  casesfitij  dsl.  varlist[  ‘weightf  romdecisiont  ree  ‘  ].  index]  =  vec[4] 

48  el if  groupvalue  ==  5  and  gendervalue  ==  2: 

49  dsl. cases[ra# dsl. varlist[ ‘weightf romdecisiont ree  ]. index]  =  vec[14] 
el if  groupvalue  ==  6  and  gendervalue  ==  1: 

51  dsl.  casestnij  dsl.  varlist[  ‘weightf  romdecisiont  ree  ’].  index]  =  vec[5] 

el if  groupvalue  ==  6  and  gendervalue  ==  2: 

dsl.  casesfitij  dsl.  varlist[  ‘weightf  romdecisiont  ree  ‘  ].  index]  =  vec[15] 
54  el if  groupvalue  ==  7  and  gendervalue  ==  1: 


dsl . cases[ m, dsl . varlist [ ‘  weightf romdecisiont  ree ‘ ] . index]  =  vec[ 2 ] 
56  el if  groupvalue  ==  7  and  gendervalue  ==  2: 

dsl . cases[ m, dsl . varlist [ ‘weightf romdecisiont  ree ‘ ] . index]  =  vec[ 12 ] 
el if  groupvalue  ==  8  and  gendervalue  ==  1: 

dsl . cases[ m, dsl . varlist [ ‘  weightf romdecisiont  ree ‘ ] . index]  =  vec[ 8 ] 
elif  groupvalue  ==  8  and  gendervalue  ==  2: 

dsl . cases[ m, dsl . varlist [ ‘ weightf romdecisiont  ree ‘ ] . index]  =  vec[ 18] 
elif  groupvalue  ==  9  and  gendervalue  ==  1: 

dsl.  cases[m,  dsl.  varlist  [‘  weightf  romdecisiont  ree  ’].  index]  =  1 
64  elif  groupvalue  ==  9  and  gendervalue  ==  2: 

dsl.  cases[m,  dsl.  varlist  [  ‘  weightf  romdecisiont  ree  P  ].  index]  =  1 
elif  groupvalue  ==  10  and  gendervalue  ==  1: 

dsl . cases[ m, dsl . varlist [ ‘  weightf romdecisiont  ree ‘ ] . index]  =  vec[ 3] 
68  elif  groupvalue  ==  10  and  gendervalue  ==  2: 

dsl . eases[ mt dsl . varlist [ ‘  weightf romdecisiont  ree ‘ ] . index]  =  vec[ 13] 

70  else: 

dsl . cases[ m, dsl . varlist [ ‘  weightf romdecisiont  ree ‘ ] . index]  =1 

72 

73  spss.  Submit  {_ r . 

74  DATASET  ACTIVATE  weight  eddat  a. 

75  SAVE  OUTFILE= ‘E:\WEI  UAH\My  SP55\Test F iles\ 

76  ori gi nal dat  awi t  hi oc  al andgl obal anddec i s i ont  reewei ght  s . s  av ‘ . 

77  DATASET  CLOSE  ALL. 

78  . > 

79  End  Program, 


